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(57) Abstract 

A method for selecting recombinant host cells expressing high levels of a desired protein is described. This method utilizes eukaryotic 
host cells harboring a DNA construct comprising a selectable gene (preferably an amplifiable gene) and a product gene provided V to 
the selectable gene. The selectable gene is positioned within an intron defined by a splice donor site and a splice acceptor site and the 
selectable gene and product gene are under the transcriptional control of a single transcriptional regulatory region. The splice donor site 
is generally an efficient splice donor site and thereby regulates expression of the product gene using the transcriptional regulatory region. 
The transfected cells are cultured so as to express the gene encoding the product in a selective medium comprising an amplifying agent 
for sufficient time to allow amplification to occur, whereupon either the desired product is recovered or cells having multiple copies of the 
product gene are identified. 
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METHOD FOR SELECTING HIGH - EXPRESSING HOST CELLS 
BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to a method of selecting for high-expressing 
5 host cells, a method of producing a protein of interest in high yields and 
a method of producing eukaryotic cells having multiple copies of a sequence 
encoding a protein of interest. 

Description of Background and Related Art 

The discovery of methods for introducing DNA into living host cells 

10 in a functional form has provided the key to understanding many fundamental 
biological processes, and has made possible the production of important 
proteins and other molecules in commercially useful quantities. 

Despite the general success of such gene transfer methods, several 
common problems exist that may limit the efficiency with which a gene 

15 encoding a desired protein can be introduced into and expressed in a host 
cell. One problem is knowing when the gene has been successfully 
transferred into recipient cells. A second problem is distinguishing 
between those cells that contain the gene and those that have survived the 
transfer procedures but do not contain the gene. A third problem is 

20 identifying and isolating those cells that contain the gene and that are 
expressing high levels of the protein encoded by the gene. 

In general, the known methods for introducing genes into eukaryotic 
cells tend to be highly inefficient. Of the cells in a given culture, only 
a small proportion take up and express exogenously added DNA, and an even 

25 smaller proportion stably maintain that DNA. 

Identification of those cells that have incorporated a product gene 
encoding a desired protein typically is achieved by introducing into the 
same cells another gene, commonly referred to as a selectable gene, that 
encodes a selectable marker. A selectable marker is a protein that is 

30 necessary for the growth or survival of a host cell under the particular 
culture conditions chosen, such as an enzyme that confers resistance to an 
antibiotic or other drug, or an enzyme that compensates for a metabolic or 
catabolic defect in the host cell. For example, selectable genes commonly 
used with eukaryotic cells include the genes for aminoglycoside 

35 phosphotransferase (APH) , hygromycin phosphotransferase (hyg) , 
dihydrofolate reductase (DHFR) , thymidine kinase (tk) , neomycin, puromycin, 
glutamine synthetase, and asparagine synthetase. 

The method of identifying a host cell that has incorporated one gene 
on the basis of expression by the host cell of a second incorporated gene 

4 0 encoding a selectable marker is referred to as cotransf ectation (or 
cotransfection) . In that method, a gene encoding a desired polypeptide and 
a selection gene typically are introduced into the host cell 
simultaneously, although they may be introduced sequentially. In the case 
of simultaneous cotransf ectation, the gene encoding the desired polypeptide 
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and the selectable gene may be present on a single DMA molecule or on 
separate DNA molecules prior to being introduced into the host cells. 
Wigler ec al., Cell . 16:777 (1979). Cells that have incorporated the gene 
encoding the desired polypeptide then are identified or isolated by 
5 culturing the cells under conditions that preferentially allow for the 
growth or survival of those cells that synthesize the selectable marker 
encoded by the selectable gene . 

The level of expression of a gene introduced into a eukaryotic host 
cell depends on multiple factors, including gene copy number, efficiency 

10 of transcription, messenger RNA (mRNA) processing, stability, and 
translation efficiency. Accordingly, high level expression of a desired 
polypeptide typically will involve optimizing one or more of those factors. 

For example, the level of protein production may be increased by 
covalently joining the coding sequence of the gene to a "strong" promoter 

15 or enhancer that will give high levels of transcription. Promoters and 
enhancers are nucleotide sequences that interact specifically with proteins 
in a host cell that are involved in transcription. Kriegler, Meth. 
Enzvmol . , 185:512 (1990); Maniatis et aJ., Science , 236:1237 (1987). 
Promoters are located upstream of the coding sequence of a gene and 

20 facilitate transcription of the gene by RNA polymerase. Among the 
eukaryotic promoters that have been identified as strong promoters for 
high-level expression are the SV40 early promoter, adenovirus major late 
promoter, mouse metallothionein-1 promoter, Rous sarcoma virus long 
terminal repeat, and human cytomegalovirus immediate early promoter (CMV) . 

25 Enhancers stimulate transcription from a linked promoter. Unlike 

promoters, enhancers are active when placed downstream from the 
transcription initiation site or at considerable distances from the 
promoter, although in practice enhancers may overlap physically and 
functionally with promoters. For example, all of the strong promoters 

30 listed above also contain strong enhancers. Bendig, Genetic Engineering . 
7:91 (Academic Press, 1988). 

The level of protein production also may be increased by increasing 
the gene copy number in the host cell . One method for obtaining high gene 
copy number is to directly introduce into the host cell multiple copies of 

35 the gene, for example, by using a large molar excess of the product gene 
relative to the selectable gene during cotransf ectation. Kaufman, Meth. 
Enzvmol . , 185:537 (1990) . With this method, however, only a small 
proportion of the cotransf ected cells will contain the product gene at high 
copy number. Furthermore, because no generally applicable, convenient 

40 method exists for distinguishing such cells from the majority of cells that 
contain fewer copies of the product gene, laborious and time-consuming 
screening methods typically are required to identify the desired high- copy 
number trans fectants . 

Another method for obtaining high gene copy number involves cloning 

45 the gene in a vector that is capable of replicating autonomously in the 
host cell. Examples of such vectors include mammalian expression vectors 
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derived from Epstein-Barr virus or bovine papilloma virus, and yeast 2- 
micron plasmid vectors. Stephens & Hentschel, Biochem . J . , 248:1 (1987); 
Yates et al Nature , 313:812 (1985); Beggs, Genetic Engineering , 2:175 
(Academic Press, 1981) . 
5 Yet another method for obtaining high gene copy number involves gene 

amplification in the host cell. Gene amplification occurs naturally in 
eukaryotic cells at a relatively low frequency. Schimke, J. Biol. Chem. , 
263:5989 (1988). However, gene amplification also may be induced, or at 
least selected for, by exposing host cells to appropriate selective 

10 pressure. For example, in many cases it is possible to introduce a product 
gene together with an amplifiable gene into a host cell and subsequently 
select for amplification of the marker gene by exposing the cotransf ected 
cells to sequentially increasing concentrations of a selective agent. 
Typically the product gene will be coamplif ied with the marker gene under 

15 such conditions. 

The most widely used amplifiable gene for that purpose is a DHFR 
gene, which encodes a dihydrof olate reductase enzyme. The selection agent 
used in conjunction with a DHFR gene is methotrexate (Mtx) . A host cell 
is cotransf ected with a product gene encoding a desired protein and a DHFR 

20 gene, and transf ectants are identified by first culturing the cells in 
culture medium that contains Mtx. A suitable host cell when a wild- type 
DHFR gene is used is the Chinese Hamster Ovary (CHO) cell line deficient 
in DHFR activity, prepared and propagated as described by Urlaub & Chasin, 
Proc. Nat. Acad. Sci. USA , 77:4216 (1980). The transfected cells then are 

25 exposed to successively higher amounts of Mtx. This leads to the synthesis 
of multiple copies of the DHFR gene, and concomitantly, multiple copies of 
the product gene. Schimke, J. Biol. Chem. , 263:5989 (1988); Axel et al . , 
U.S. Patent No. 4,399,216; Axel et al . , U.S. Patent No. 4,634,665. Other 
references directed to co- trans feet ion of a gene together with a genetic 

30 marker that allows for selection and subsequent amplification include 
Kaufman in Genetic Engineering , ed. J. Setlow (Plenum Press, New York), 
Vol. 9 (1987); Kaufman and Sharp, J. Mol . Biol. , 159:601 (1982); Ringold 
et al., J. Mol. AppI. Genet. , 1:165-175 (1981); Kaufman et al., Mol. Cell 
Biol. , 5:1750-1759 (1985); Kaetzel and Nilson, J. Biol. Chem. . 263:6244- 

35 6251 (1988); Hung et al . , Proc. Natl. Acad. Sci. USA . 83:261-264 (1986); 
Kaufman et al . , EMBO J. . 6:87-93 (1987); Johnston and Kucey, Science . 
242:1551-1554 (1988); Urlaub et al., Cell , 33:405-412 (1983). 

To extend the DHFR amplification method to other cell types, a mutant 
DHFR gene that encodes a protein with reduced sensitivity to methotrexate 

40 may be used in conjunction with host cells that contain normal numbers of 
an endogenous wild-type DHFR gene. Simonsen and Levinson, Proc . Natl . 
Acad. Sci. USA , 80:2495 (1983); Wigler et al . , Proc. Natl . Acad. Sci. USA, 
77:3567-3570 (1980); Haber and Schimke, Somatic C ell Genetics, 8:499-508 
(1982). 

45 Alternatively, host cells may be co- transf ected with the product 

gene, a DHFR gene, and a dominant selectable gene, such as a neo r gene. Kim 
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and Wold, Cell , 42:129 (1985); Capon et al . , U.S. Pat. No. 4,965,199. 
Transf ectants are identified by first culturing the cells in culture medium 
containing neomycin (or the related drug G418) , and the transf ectants so 
identified then are selected for amplification of the DHFR gene and the 
5 product gene by exposure to successively increasing amounts of Mtx. 

As will be appreciated from this discussion, the selection of 
recombinant host cells that express high levels of a desired protein 
generally is a multi-step process. In the first step, initial 
transf ectants are selected that have incorporated the product gene and the 

10 selectable gene. In subsequent steps, the initial transf ectants are 
subject to further selection for high-level expression of the selectable 
gene and then random screening for high-level expression of the product 
gene. To identify cells expressing high levels of the desired protein, 
typically one must screen large numbers of transf ectants . The majority of 

15 transf ectants produce less than maximal levels of the desired protein. 
Further, Mtx resistance in DHFR transf ormants is at least partially 
conferred by varying degrees of gene amplification. Schimke, Cell , 37:705- 
713 (1984) . The inadequacies of co-expression of the non-selected gene 
have been reported by Wold et ai., Proc. Natl. Acad. Sci . USA . 76:5684-5688 

20 (1979) . Instability of the amplified DNA is reported by Kaufman and 
Schimke, Mol . Cell Biol . , 1:1069-1076 (1981); Haber and Schimke, Cell , 
26:355-362 (1981); and Fedespiel et al . , J. Biol. Chem. . 259:9127-9140 
(1984) . 

Several methods have been described for directly selecting such 

25 recombinant host cells in a single step. One strategy involves co- 
transfecting host cells with a product gene and a DHFR gene, and selecting 
those cells that express high levels of DHFR by directly culturing in 
medium containing a high concentration of Mtx. Many of the cells selected 
in that manner also express the co- transf ected product gene at high levels. 

30 Page and Sydenham, Bio /Technology . 9:64 (1991) . This method for single- step 
selection suffers from certain drawbacks that limit its usefulness. High- 
expressing cells obtained by direct culturing in medium containing a high 
level of a selection agent may have poor growth and stability 
characteristics, thus limiting their usefulness for long-term production 

35 processes. Page and Snyderman, Bio/Technology . 9:64 (1991). Single-step 
selection for high-level resistance to Mtx may produce cells with an 
altered, Mtx-resistant DHFR enzyme, or cells that have altered Mtx 
transport properties, rather than cells containing amplified genes . Haber 
et al., J. Biol. Chem. . 256:9501 (1981); Assaraf and Schimke, Proc. Natl. 

40 Acad. Sci. USA , 84:7154 (1987). 

Another method involves the use of polycistronic mRNA expression 
vectors containing a product gene at the 5' end of the transcribed region 
and a selectable gene at the 3' end. Because translation of the selectable 
gene at the 3' end of the polycistronic mRNA is inefficient, such vectors 

45 exhibit preferential translation of the product gene and require high 
levels of polycistronic mRNA to survive selection. Kaufman, Meth. 
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Enzvmol . , 185:487 (1990); Kaufman, Meth , Enzvmol 8 , 185:537 (1990); Kaufman 
et al., EMBO J. , 6:187 (1987). Accordingly, cells expressing high levels 
of the desired protein product may be obtained in a single step by 
culturing the initial transfectants in medium containing a selection agent 
5 appropriate for use with the particular selectable gene. However, the 
utility of these vectors is variable because of the unpredictable influence 
of the upstream product reading frame on selectable marker translation and 
because the upstream reading frame sometimes becomes deleted during 
methotrexate amplification (Kaufman et al. f J. Mol. Biol. , 159:601-621 

10 [1982]; Levinson, Methods in Enzvmologv, San Diego: Academic Press, Inc. 
[1990] ) . Later vectors incorporated an internal translation initiation site 
derived from members of the picornavirus family which is positioned between 
the product gene and the selectable gene (Pelletier et al., Nature, 334:320 
[1988]; Jang et al., J. Virol. , 63:1651 [1989]). 

15 A third method for single-step selection involves use of a DNA 

construct with a selectable gene containing an intron within which is 
located a gene encoding the protein of interest. See U.S. Patent No. 
5,043,270 and Abrams et al . , J. Biol. Chem. , 264(24): 14016-14021 (1989) . 
In yet another single-step selection method, host cells are co-transf ected 

20 with an intron -modified selectable gene and a gene encoding the protein of 
interest. See WO 92/17566, published October 15, 1992. The intron- 
modified gene is prepared by inserting into the transcribed region of a 
selectable gene an intron of such length that the intron is correctly 
spliced from the corresponding mRNA precursor at low efficiency, so that 

25 the amount of selectable marker produced from the intron-modif ied 
selectable gene is substantially less than that produced from the starting 
selectable gene. These vectors help to insure the integrity of the 
integrated DNA construct, but transcriptional linkage is not achieved as 
selectable gene and the protein gene are driven by separate promoters . 

30 Other mammalian expression vectors that have single transcription 

units have been described. Retroviral vectors have been constructed (Cepko 
et al., Cell , 37:1053-1062 [1984]) in which a cDNA is inserted between the 
endogenous Moloney murine leukemia virus (M-MuLV) splice donor and splice 
acceptor sites which are followed by a neomycin resistance gene. This 

35 vector has been used to express a variety of gene products following 
retroviral infection of several cell types. 

With the above drawbacks in mind, it is one object of the present 
invention to increase the level of homogeneity with regard to expression 
levels of stable clones transf ected with a product gene of interest, by 

4 0 expressing a selectable marker (DHFR) and the protein of interest from a 
single promoter. 

It is another- object to provide a method for selecting stable, 
recombinant host cells that express high levels of a desired protein 
product, which method is rapid and convenient to perform, and reduces the 
45 numbers of transf ected cells which need to be screened. Furthermore, it is 
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an object to allow high levels of single and two unit polypeptides to be 
rapidly generated from clones or pools of stable host cell trans fee tant s . 

It is an additional object to provide expression vectors which bias 
for active integration events (i.e. have an increased tendency to generate 
5 transformants wherein the DNA construct is inserted into a region of the 
genome of the host cell which results in high level expression of the 
product gene) and can accommodate a variety of product genes without the 
need for modification. 

10 SUMMARY OF THE INVENTION 

Accordingly, the present invention is directed to a DNA construct 
(DNA molecule) alternative terminology comprising a 5' transcriptional 
initiation site and a 3' transcriptional termination site, a selectable 
gene (preferably an amplifiable gene) and a product gene provided 3' to the 

15 selectable gene, a transcriptional regulatory region regulating 
transcription of both the selectable gene and the product gene, the 
selectable gene positioned within an intron defined by a splice donor site 
and a splice acceptor site. The splice donor site preferably comprises an 
effective splice donor sequence as herein defined and thereby regulates 

20 expression of the product gene using the transcriptional regulatory region. 

In another embodiment, the invention provides a method for producing 
a product of interest comprising culturing a eukaryotic cell which has been 
trans fected with the DNA construct described above, so as to express the 
product gene and recovering the product . 

25 In a further embodiment, the invention provides a method for 

producing eukaryotic cells having multiple copies of the product gene 
comprising trans feet ing eukaryotic cells with the DNA construct described 
above (where the selectable gene is an amplifiable gene) , growing the cells 
in a selective medium comprising an amplifying agent for a sufficient time 

30 for amplification to occur, and selecting cells having multiple copies of 
the product gene. Preferably transfection of the cells is achieved using 
electroporation . 

After transfection of the host cells, most of the transf ectants fail 
to exhibit the selectable phenotype characteristic of the protein encoded 

35 by the selectable gene, but surprisingly a small proportion of the 
transf ectants do exhibit the selectable phenotype, and among those 
transf ectants, the majority are found to express high levels of the desired 
product encoded by the product gene. Thus, the invention provides an 
improved method for the selection of recombinant host cells expressing high 

40 levels of a desired product, which method is useful with a wide variety of 
eukaryotic host cells and avoids the problems inherent in existing cell 
selection technology. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Figures 1A-1D illustrate schematically various DNA constructs 
encompassed by the instant invention. The large arrows represent the 
selectable gene and the product gene, the V formed by the dashed lines 
5 shows the region of the precursor RNA internal to the 5' splice donor site 
(SD) and 3' splice acceptor site (SA) that is excised from vectors that 
contain a functional SD. The transcriptional regulatory region, selectable 
gene, product gene and transcriptional termination site are depicted in 
Figure 1A. Figure IB depicts the DNA constructs of Example 1. The various 
10 splice donor sequences are depicted, i.e., wild type ras splice donor 
sequence (WT ras) , mutant ras splice donor sequence (MUTANT ras) and non- 
functional splice donor sequence UGT) . The probes used for Northern blot 
analysis in Example 1 are shown in Figure IB. Figure 1C depicts the DNA 
constructs of Example 2 and Figure ID depicts the DNA construct of Example 
15 3 used for expression of anti-IgE V H . 

Figure 2 depicts schematically the control DNA construct used in 
Example 1. 

Figures 3A-Q depict the nucleotide sequence (SEQ ID NO: 1) of the 
DHFR/ int ron- (WT ras SD) -tPA expression vector of Example 1. 
20 Figure 4 is a bar graph which shows the number of colonies that form 

in selective medium after electroporation of linearized duplicate miniprep 
DNA' s prepared in parallel from the three vectors shown in Figure IB (i.e. 
with wild type ras splice donor sequence [WT ras] , mutant ras splice donor 
sequence [MUTANT ras] and non- functional splice donor sequence UGT] ) and 
25 from the control vector that has DHFR under control of SV40 promoter and 
tPA under control of CMV promoter (see Figure 2) . Cells were selected in 
nucleoside free medium and counted with an automated colony counter. 

Figures 5A-C are bar graphs depicting expression of tPA from stable 
pools and clones generated from the vectors shown in Figure IB. In Figure 
30 5A greater than 100 clones from each vector transfection were mixed, plated 
in 24 well plates, and assayed by tPA ELISA at "saturation". In Figure SB, 
twenty clones chosen at random derived from each of the vectors were 
assayed by tPA ELISA at "saturation". In Figure 5C, the pools mentioned in 
Figure 5A (except the aGT pool) were exposed to 200nM Mtx to select for 
35 DHFR amplification and then pooled and assayed for tPA expression. 

Figures 6A-P depict the nucleotide sequence (SEQ ID NO: 2) of the 
DHFR/intron- (WT ras SD) -TNFr-IgG expression vector of Example 2. 

Figures 7A-B are bar graphs depicting expression of TNFr-IgG using 
dicistronic or control vectors (see Example 2) . Vectors containing TNFr- 
40 IgG (but otherwise identical to those described for tPA expression in 
Example 1) were constructed (see Figure 1C) f introduced into dpl2.CHO cells 
by electroporation, pooled, and assayed for product expression before 
(Figure 7A) and after (Figure 7B) being subjected to amplification in 200nM 
Mtx. 

45 Figure 8 depicts schematically the DNA construct used for expression 

of the V L of anti-IgE in Example 3. 
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Figures 9A-0 depict the nucleotide sequence (SEQ ID NO: 3) of the 
anti-IgE V H expression vector of Example 3. 

Figures 10A-Q depict the nucleotide sequence (SEQ ID NO: 4) of the 
anti-IgE V L expression vector of Example 3. 
5 Figure 11 is a bar graph depicting anti-IgE expression in Example 3. 

Heavy (V„) and light <V L ) chain expression vectors were constructed, co- 
electroporated into CHO cells, clones were selected and assayed for 
antibody expression. Additionally, pools were established and assessed 
with regard to expression before and after Mtx selection at 200nM and l^M. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Definitions : 

The "DNA construct" disclosed herein comprises a non-naturally 
occurring DNA molecule which can either be provided as an isolate or 
integrated in another DNA molecule e.g. in an expression vector or the 
chromosome of an eukaryotic host cell. 

The term "selectable gene" as used herein refers to a DNA that 
encodes a selectable marker necessary for the growth or survival of a host 
cell under the particular cell culture conditions chosen. Accordingly, a 
host cell that is transformed with a selectable gene will be capable of 
growth or survival under certain cell culture conditions wherein a non- 
transfected host cell is not capable of growth or survival. Typically, a 
selectable gene will confer resistance to a drug or compensate for a 
metabolic or catabolic defect in the host cell. Examples of selectable 
genes are provided in the following table. See also Kaufman, Methods in 
Enzvmoloqy , 185: 537-566 (1990), for a review of these. 



TABLE 1 

Selectable Genes and their Selection Agents 



Selection Agent 


Selectable Gene 


Methotrexate 


Dihydrof olate reductase 


Cadmium 


Metal lothionein 


PAIiA 


CAD 


Xyl-A-or adenosine and 2' - 
deoxycoformycin 


Adenosine deaminase 


Adenine, azaserine, and 
coformycin 


Adenylate deaminase 


6-Azauridine, pyrazofuran 


UMP Synthetase 


Mycophenolic acid 


IMP 5' -dehydrogenase 



15 



20 
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Mycophenolic acid with 
limiting xanthine 


Xanthine -guanine 
phosphor ibosyltransf erase 


Hypoxanthine , aminopterin, 
and thymidine (HAT) 


Mutant HGPRTase or mutant i 
thymidine kinase 


5 - Fluorodeoxyuridine 


Thymidylate synthetase 


Multiple drugs e.g. 
adriamycin, vincristine or 
colchicine 


P -glycoprotein 170 


Aphidicolin 


Ribonucleotide reductase 


Methionine sulfoximine 


Glutamine synthetase 


0-Aspartyl hydroxamate or 
Albizziin 


Asparagine synthetase 


Canavanine 


Arginosuccinate synthetase 


a-Dif luoromethylorni thine 


Ornithine decarboxylase 


Compactin 


HMG-CoA reductase 


Tunicamycin 


W-Acetylglucosaminyl 
transferase 


Borrelidin 


Threonyl - tRNA synthetase | 


Ouabain 


Na*K* - ATPase 



The preferred selectable gene is an amplif iable gene . As used herein, 

20 the term "amplif iable gene" refers to a gene which is amplified (i.e. 
additional copies of the gene are generated which survive in 
intrachromosomal or extrachromosomal form) under certain conditions. The 
amplif iable gene usually encodes an enzyme (i.e. an amplif iable marker) 
which is required for growth of eukaryotic cells under those conditions . 

25 For example, the gene may encode DHFR which is amplified when a host cell 
transformed therewith is grown in Mtx. According to Kaufman, the selectable 
genes in Table 1 above can also be considered amplif iable genes . An example 
of a selectable gene which is generally not considered to be an amplif iable 
gene is the neomycin resistance gene (Cepko et al. , supra) . 

30 As used herein, "selective medium" refers to nutrient solution used 

for growing eukaryotic cells which have the selectable gene and therefore 
includes a "selection agent". Commercially available media such as Ham's 
F10 (Sigma), Minimal Essential Medium ([MEM], Sigma), RPMI-1640 (Sigma), 
and Dulbecco's Modified Eagle's Medium ( [DMEM] , Sigma) are exemplary 

35 nutrient solutions. In addition, any of the media described in Ham and 
Wallace, Meth . Enz . , 58:44 (1979), Barnes and Sato, Anal . Biochem. , 102:255 
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(1980), U.S. Patent Nos. 4,767,704; 4,657,866; 4,927,762; or 4,560,655; WO 
90/03430; WO 87/00195; U.S. Patent Re. 30,985; or U.S. Patent No. 
5,122,469, the disclosures of all of which are incorporated herein by 
reference, may be used as culture media. Any of these media may be 
5 supplemented as necessary with hormones and/or other growth factors (such 
as insulin, transferrin, or epidermal growth factor) , salts (such as sodium 
chloride, calcium, magnesium, and phosphate) , buffers (such as HEPES) , 
nucleosides (such as adenosine and thymidine), antibiotics (such as 
GentamycirJ* drug) , trace elements (defined as inorganic compounds usually 

10 present at final concentrations in the micromolar range) , and glucose or 
an equivalent energy source. Any other necessary supplements may also be 
included at appropriate concentrations that would be known to those skilled 
in the art. The preferred nutrient solution comprises fetal bovine serum. 
The term "selection agent" refers to a substance that interferes with 

15 the growth or survival of a host cell that is deficient in a particular 
selectable gene. Examples of selection agents are presented in Table 1 
above. The selection agent preferably comprises an "amplifying agent M which 
is defined for purposes herein as an agent for amplifying copies of the 
amplifiable gene, such as Mtx if the amplifiable gene is DHFR. See Table 

20 1 for examples of amplifying agents . 

As used herein, the term "transcriptional initiation site" refers to 
the nucleic acid in the DNA construct corresponding to the first nucleic 
acid incorporated into the primary transcript, i.e., the mRNA precursor, 
which site is generally provided at, or adjacent to, the 5' end of the DNA 

25 construct. 

The term "transcriptional termination site" refers to a sequence of 
DNA, normally represented at the 3' end of the DNA construct, that causes 
RNA polymerase to terminate transcription. 

As used herein, "transcriptional regulatory region" refers to a 

30 region of the DNA construct that regulates transcription of the selectable 
gene and the product gene. The transcriptional regulatory region normally 
refers to a promoter sequence (i.e. a region of DNA involved in binding of 
RNA polymerase to initiate transcription) which can be constitutive or 
inducible and, optionally, an enhancer (i.e. a cis-acting DNA element, 

35 usually from about 10-300 bp, that acts on a promoter to increase its 
transcription) . 

As used herein, "product gene" refers to DNA that encodes a desired 
protein or polypeptide product. Any product gene that is capable of 
expression in a host cell may be used, although the methods of the 

40 invention are particularly suited for obtaining high-level expression of 
a product gene that is not also a selectable or amplifiable gene. 
Accordingly, the protein or polypeptide encoded by a product gene typically 
will be one that is not necessary for the growth or survival of a host cell 
under the particular cell culture conditions chosen. For example, product 

45 genes suitably encode a peptide, or may encode a polypeptide sequence of 
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amino acids for which the chain length is sufficient to produce higher 
levels of tertiary and/or quaternary structure. 

Examples of bacterial polypeptides or proteins include, e.g., 
alkaline phosphatase and 0-lactamase. Examples of mammalian polypeptides 
5 or proteins include molecules such as renin; a growth hormone, including 
human growth hormone, and bovine growth hormone; growth hormone releasing 
factor; parathyroid hormone; thyroid stimulating hormone; lipoproteins; 
alpha- 1 -antitrypsin; insulin A-chain; insulin B-chain; proinsulin; follicle 
stimulating hormone; calcitonin; luteinizing hormone; glucagon; clotting 

10 factors such as factor VIIIC, factor IX, tissue factor, and von willebrands 
factor; anti- clotting factors such as Protein C; atrial natriuretic factor; 
lung surfactant; a plasminogen activator, such as urokinase or human urine 
or tissue-type plasminogen activator (t-PA) ; bombesin; thrombin; 
hemopoietic growth factor; tumor necrosis factor-alpha and -beta; 

15 enkephalinase; RANTES (regulated on activation normally T-cell expressed 
and secreted) ; human macrophage inflammatory protein (MIP-l-alpha) ; a serum 
albumin such as human serum albumin; mullerian- inhibiting substance; 
relaxin A-chain; relaxin B-chain; prorelaxin; mouse gonadotropin- associated 
peptide; a microbial protein, such as beta- lactamase; DNase; inhibin; 

20 activin; vascular endothelial growth factor (VEGF) ; receptors for hormones 
or growth factors; integrin; protein A or D; rheumatoid factors; a 
neurotrophic factor such as bone -derived neurotrophic factor (BDNF) , 
neurotrophic 3, -4, -5, or -6 (NT-3 , NT- 4, NT-5, or NT- 6) , or a nerve 
growth factor such as NGF-0; platelet -derived growth factor (PDGF) ; 

25 fibroblast growth factor such as aFGF and bFGF; epidermal growth factor 
(EOF) ; transforming growth factor (TGF) such as TGF-alpha and TGF-beta, 
including TGF- 01, TGF-02, TGF-03, TGF-04, or TGF-05; insulin-like growth 
factor-I and -II (IGF-I and IGF-II) ; des (1-3) r IGF-I (brain IGF-I) , insulin- 
like growth factor binding proteins; CD proteins such as CD-3, CD-4, CD-8, 

30 and CD- 19; erythropoietin; osteoinductive factors; immuno toxins ; a bone 
morphogenetic protein (BMP) ; an interferon such as interferon- alpha, -beta, 
and -gamma; colony stimulating factors (CSFs) , e.g., M-CSF, GM-CSF, and G- 
CSF; interleukins (ILs) , e.g., IL-1 to IL-10; superoxide di smut as e; T-cell 
receptors; surface membrane proteins; decay accelerating factor; viral 

35 antigen such as, for example, a portion of the AIDS envelope; transport 
proteins; homing receptors; addressins; regulatory proteins; antibodies; 
chimeric proteins such as immunoadhesins and fragments of any of the above- 
listed polypeptides. 

The product gene preferably does not consist of an anti -sense 

40 sequence for inhibiting the expression of a gene present in the host. 
Preferred proteins herein are therapeutic proteins such as TGF-/3, TGF-a, 
PDGF, EGF, FGF, IGF-I, DNase, plasminogen activators such as t-PA, clotting 
factors such as tissue factor and factor VIII, hormones such as relaxin and 
insulin, cytokines such as IFN-7, chimeric proteins such as TNF receptor 

45 IgG immunoadhesin (TNFr-IgG) or antibodies such as anti-IgE. 
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The term "intron" as used herein refers to a nucleotide sequence 
present within the transcribed region of a gene or within a messenger RNA 
precursor, which nucleotide sequence is capable of being excised, or 
spliced, from the messenger RNA precursor by a host cell prior to 
5 translation. Introns suitable for use in the present invention are 
suitably prepared by any of several methods that are well known in the art, 
such as purification from a naturally occurring nucleic acid or de novo 
synthesis. The introns present in many naturally occurring eukaryotic 
genes have been identified and characterized. Mount, Nuc . Acids Res . . 

10 10:459 (1982) . Artificial introns comprising functional splice sites also 
have been described. Winey et al., Mol. Cell Biol. . 9:329 (1989); 
Gatermann et al., Mol . Cell Biol . , 9:1526 (1989). Introns may be obtained 
from naturally occurring nucleic acids, for example, by digestion of a 
naturally occurring nucleic acid with a suitable restriction endonuclease, 

15 or by PCR cloning using primers complementary to sequences at the 5' and 
3' ends of the intron. Alternatively, introns of defined sequence and 
length may be prepared synthetically using various methods in organic 
chemistry. Narang et al . . Meth . Enzvmol . , 68:90 (1979); Ca rut hers et al . , 
Meth . Enzvmol . , 154:287 (1985); Froehler et al . , Nuc. Acids Res. . 14:5399 

20 (1986) . 

As used herein "splice donor site" or "SD" refers to the DNA sequence 
immediately surrounding the exon- intron boundary at the 5* end of the 
intron,- where the "exon" comprises the nucleic acid 5' to the intron. Many 
splice donor sites have been characterized and Ohshima et al., J . Mol . 

25 Biol . , 195:247-259 (1987) provides a review of these. An "efficient splice 
donor sequence" refers to a nucleic acid sequence encoding a splice donor 
site wherein the efficiency of splicing of messenger RNA precursors having 
the splice donor sequence is between about 80 to 99% and preferably 90 to 
95% as determined by quantitative PCR. Examples of efficient splice donor 

30 sequences include the wild type (WT) ras splice donor sequence and the 
GAC : 6TAAGT sequence of Example 3. Other efficient splice donor sequences 
can be readily selected using the techniques for measuring the efficiency 
of splicing disclosed herein. 

The terms "PCR" and "polymerase chain reaction" as used herein refer 

35 to the in vitro amplification method described in US Patent No. 4,683,195 
(issued July 28, 1987) . In general, the PCR method involves repeated 
cycles of primer extension synthesis, using two DNA primers capable of 
hybridizing preferentially to a template nucleic acid comprising the 
nucleotide sequence to be amplified. The PCR method can be used to clone 

4 0 specific DNA sequences from total genomic DNA, cDNA transcribed from 
cellular RNA, viral or plasmid DNAs . Wang & Mark, in PCR Protocols , pp. 70- 
75 (Academic Press, 1990); Scharf, in PCR Protocols , pp. 84-98; Kawasaki 
& Wang, in PCR Technology , pp. 89-97 (Stockton Press, 1989) . Reverse 
transcription-polymerase chain reaction (RT-PCR) can be used to analyze RNA 

45 samples containing mixtures of spliced and unspliced mRNA transcripts. 
Fluorescently tagged primers designed to span the intron are used to 
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amplify both spliced and unspliced targets. The resultant amplification 
products are then separated by gel electrophoresis and quantitated by 
measuring the fluorescent emission of the appropriate band(s) . A 
comparison is made to determine the amount of spliced and unspliced 
5 transcripts present in the RNA sample . 

One preferred splice donor sequence is a "consensus splice donor 
sequence". The nucleotide sequences surrounding intron splice sites, which 
sequences are evolutionarily highly conserved, are referred to as 
"consensus splice donor sequences". In the mRNAs of higher eukaryotes, the 

10 5' splice site occurs within the consensus sequence AG:GUAAGU (wherein the 
colon denotes the site of cleavage and ligation) . In the mRNAs of yeast, 
the 5' splice site is bounded by the consensus sequence :GUAUGU. Padgett, 
et al., Ann. Rev. Biochem. . 55:1119 (1986). 

The expression "splice acceptor site" or "SA" refers to the sequence 

15 immediately surrounding the intron-exon boundary at the 3' end of the 
intron, where the "exon" comprises the nucleic acid 3' to the intron. Many 
splice acceptor sites have been characterized and Ohshima et al . , J . Mol . 
Biol . . 195:247-259 (1987) provides a review of these. The preferred splice 
acceptor site is an efficient splice acceptor site which refers to a 

20 nucleic acid sequence encoding a splice acceptor site wherein the 
efficiency of splicing of messenger RNA precursors having the splice 
acceptor site is between about 80 to 99% and preferably 90 to 95% as 
determined by quantitative PGR. The splice acceptor site may comprise a 
consensus sequence. In the mRNAs of higher eukaryotes, the 3' splice 

25 acceptor site occurs within the consensus sequence (U/OuNCAG :G. In the 
mRNAs of yeast, the 3' acceptor splice site is bounded by the consensus 
sequence (C/U)AG: . Padgett, et al . , supra. 

As used herein "culturing for sufficient time to allow amplification 
to occur" refers to the act of physically culturing the eukaryotic host 

30 cells which have been transformed with the DNA construct in cell culture 
media containing the amplifying agent, until the copy number of the 
amplif iable gene (and preferably also the copy number of the product gene) 
in the host cells has increased relative to the transformed cells prior to 
this culturing. 

35 The term "expression" as used herein refers to transcription or 

translation occurring within a host cell. The level of expression of a 
product gene in a host cell may be determined on the basis of either the 
amount of corresponding mRNA that is present in the cell or the amount of 
the protein encoded by the product gene that is produced by the cell. For 

40 example, mRNA transcribed from a product gene is desirably quantitated by 
northern hybridization. Sambrook, et al., Molecular Cloning; A Laboratory 
Manual, pp. 7.3-7.57 (Cold Spring Harbor Laboratory Press, 1989) . Protein 
encoded by a product gene can be quantitated either by assaying for the 
biological activity of the protein or by employing assays that are 

45 independent of such activity, such as western blotting or radioimmunoassay 
using antibodies that are capable of reacting with the protein. Sambrook, 
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et al., Molecular Cloning; A Laboratory Manual . pp. 18.1-18.88 (Cold Spring 
Harbor Laboratory Press, 1989) . 

Modes for Carrying Out the Invention 

Methods and compositions are provided for enhancing the stability 
5 and/or copy number of a transcribed sequence in order to allow for elevated 
levels of a RNA sequence of interest. In general, the methods of the 
present invention involve transfecting a eukaryotic host cell with an 
expression vector comprising both a product gene encoding a desired 
polypeptide and a selectable gene (preferably an amplif iable gene) . 

10 Selectable genes and product genes may be obtained from genomic DNA, 

cDNA transcribed from cellular RNA, or by in vitro synthesis. For example, 
libraries are screened with probes (such as antibodies or oligonucleotides 
of about 20-80 bases) designed to identify the selectable gene or the 
product gene (or the protein (s) encoded thereby) . Screening the cDNA or 

15 genomic library with the selected probe may be conducted using standard 
procedures as described in chapters 10-12 of Sambrook et al., Molecular 
Cloning: A Laboratory Manual (New York: Cold Spring Harbor Laboratory 
Press, 1989) . An alternative means to isolate the selectable gene or 
product gene is to use PCR methodology as described in section 14 of 

20 Sambrook et al . , supra. 

A preferred method of practicing this invention is to use carefully 
selected oligonucleotide sequences to screen cDNA libraries from various 
tissues known to contain the selectable gene or product gene. The 
oligonucleotide sequences selected as probes should be of sufficient length 

25 and sufficiently unambiguous that false positives are minimized. 

The oligonucleotide generally is labeled such that it can be detected 
upon hybridization to DNA in the library being screened. The preferred 
method of labeling is to use "P- labeled ATP with polynucleotide kinase, 
as is well known in the art, to radiolabel the oligonucleotide. However, 

30 other methods may be used to label the oligonucleotide, including, but not 
limited to, biotinylation or enzyme labeling. 

Sometimes, the DNA encoding the selectable gene and product gene is 
preceded by DNA encoding a signal sequence having a specific cleavage site 
at the N- terminus of the mature protein or polypeptide. In general, the 

35 signal sequence may be a component of the expression vector, or it may be 
a part of the selectable gene or product gene that is inserted into the 
expression . vector. If a heterologous signal sequence is used, it 
preferably is one that is recognized and processed (i.e., cleaved by a 
signal peptidase) by the host cell. For yeast secretion the native signal 

40 sequence may be substituted by, e.g., the yeast invertase leader, alpha 
factor leader (including Saccharomyces and JCluyveromyces a- factor leaders, 
the latter described in U.S. Pat. No. 5,010,182 issued 23 April 1991), or 
acid phosphatase leader, the C. albicans glucoamylase leader (EP 362,179 
published 4 April 1990) , or the signal described in WO 90/13646 published 

45 15 November 1990. In mammalian cell expression the native signal sequence 
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of the protein of interest is satisfactory, although other mammalian signal 
sequences may be suitable, such as signal sequences from secreted 
polypeptides of the same or related species, as well as viral secretory 
leaders, for example, the herpes simplex gD signal. The DNA for such 
5 precursor region is ligated in reading frame to the selectable gene or 
product gene. 

As shown in Figure 1A, the selectable gene is generally provided at 
the 5' end of the DNA construct and this selectable gene is followed by the 
product gene. Therefore, the full length (non-spiced) message will contain 

10 DHFR as the first open reading frame and will therefore generate DHFR 
protein to allow selection of stable trans feet ant s . The full length message 
is not expected to generate appreciable amounts of the protein of interest 
as the second AUG in a dicistronic message is an inefficient initiator of 
translation in mammalian cells (Kozak, J. Cell Biol. , 115: 887-903 [1991]). 

15 The selectable gene is positioned within an intron. Introns are 

noncoding nucleotide sequences, normally present within many eukaryotic 
genes, which are removed from newly transcribed mRNA precursors in a 
multiple-step process collectively referred to as splicing. 

A single mechanism is thought to be responsible for the splicing of 

20 mRNA precursors in mammalian, plant, and yeast cells. In general, the 
process of splicing requires that the 5* and 3' ends of the intron be 
correctly cleaved and the resulting ends of the mRNA be accurately joined, 
such that a mature mRNA having the proper reading frame for protein 
synthesis is produced. Analysis of a variety of naturally occurring and 

25 synthetically constructed mutant genes has shown that nucleotide changes 
at many of the positions within the consensus sequences at the 5' and 3' 
splice sites have the effect of reducing or abolishing the synthesis of 
mature mRNA. Sharp, Science . 235:766 (1987); Padgett, et a!., Ann. Rev. 
Biochem. , 55:1119 (1986),- Green, Ann. Rev. Genet. , 20:671 (1986). 

30 Mutational studies also have shown that RNA secondary structures involving 
splicing sites can affect the efficiency of splicing. Solnick, Cell . 
43:667 (1985); Konarska, et al . , Cell , 42:165 (1985). 

The length of the intron may also affect the efficiency of splicing. 
By making deletion mutations of different sizes within the large intron of 

35 the rabbit beta-globin gene, Wieringa, et al. determined that the minimum 
intron length necessary for correct splicing is about 69 nucleotides. 
Cell , 37:915 (1984) . Similar studies of the intron of the adenovirus El A 
region have shown that an intron length of about 78 nucleotides allows 
correct splicing to occur, but at reduced efficiency. Increasing the 

4 0 length of the intron to 91 nucleotides restores normal splicing efficiency, 
whereas truncating the intron to 63 nucleotides abolishes correct splicing. 
Ulfendahl, et al., Nuc. Acids Res. . 13:6299 (1985). 

To be useful in the invention, the intron must have a length such 
that splicing of the intron from the mRNA is efficient. The preparation of 

45 introns of differing lengths is a routine matter, involving methods well 
known in the art, such as de novo synthesis or in yitro deletion 
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mutagenesis of an existing intron. Typically, the intron will have a length 
of at least about 150 nucleotides, since introns which are shorter than 
this tend to be spliced less efficiently. The upper limit for the length 
of the intron can be up to 30 kB or more. However, as a general 
5 proposition, the intron is generally less than about 10 kB in length. 

The intron is modified to contain the selectable gene not normally 
present within the intron using any of the various known methods for 
modifying a nucleic acid In vitro. Typically, a selectable gene will be 
introduced into an intron by first cleaving the intron with a restriction 

10 endonuc lease , and then covalently joining the resulting restriction 
fragments to the selectable gene in the correct orientation for host cell 
expression, for example by ligation with a DNA ligase enzyme. 

The DNA construct is dicistronic, i.e. the selectable gene and 
product gene are both under the transcriptional control of a single 

15 transcriptional regulatory region. As mentioned above, the transcriptional 
regulatory region comprises a promoter. Suitable promoting sequences for 
use with yeast hosts include the promoters for 3-phosphoglycerate kinase 
(Hitzeman et al., J. Biol. Chem. , 255:2073 [1980]) or other glycolytic 
enzymes (Hess et al., J. Adv. Enzyme Reg. , 7:149 [1968]; and Holland, 

20 Biochemistry . 17:4900 [1978]), such as enolase, glycer aldehyde- 3 -phosphate 
dehydrogenase, hexokinase, pyruvate decarboxylase, phosphof ructokinase, 
glucose -6 -phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, 
triosephosphate isomerase, phosphoglucose isomerase, and glucokinase. 

Other yeast promoters, which are inducible promoters having the 

25 additional advantage of transcription controlled by growth conditions, are 
the promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid 
phosphatase, degradative enzymes associated with nitrogen metabolism, 
metallothionein, glyceraldehyde- 3 -phosphate dehydrogenase, and enzymes 
responsible for maltose and galactose utilization. Suitable vectors and 

30 promoters for use in yeast expression are further described in Hitzeman et 
al., EP 73,657A. Yeast enhancers also are advantageously used with yeast 
promoters . 

Expression control sequences are known for eukaryotes. Virtually all 
eukaryotic genes have an AT- rich region located approximately 25 to 30 

35 bases upstream from the site where transcription is initiated. Another 
sequence found 70 to 80 bases upstream from the start of transcription of 
many genes is a CXCAAT region where X may be any nucleotide . 

Product gene transcription from vectors in mammalian host cells is 
controlled by promoters obtained from the genomes of viruses such as 

40 polyoma virus, fowlpox virus (UK 2,211,504 published 5 July 1989), 
adenovirus (such as Adenovirus 2), bovine papilloma virus, avian sarcoma 
virus, cytomegalovirus, a retrovirus, hepatitis-B virus and most preferably 
Simian Virus 40 (SV40) , from heterologous mammalian promoters, e.g. the 
actin promoter or an immunoglobulin promoter, from heat-shock promoters, 

45 and "from the promoter normally associated with the product gene, provided 
such promoters are compatible with the host cell systems. 
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The early and late promoters of the SV40 virus are conveniently 
obtained as an SV40 restriction fragment that also contains the SV40 viral 
origin of replication. b Fiers et al . # Nature . 273:113 (1978); Mulligan and 
Berg, Science , 209:1422-1427 (1980); Pavlakis et al . , Proc. Natl . Acad. 
5 Sci . USA , 78:7398-7402 (1981). The immediate early promoter of the human 
cytomegalovirus (CMV) is conveniently obtained as a Hindlll E restriction 
fragment. Greenaway et al., Gene , 18:355-360 (1982). A system for 
expressing DNA in mammalian hosts using the bovine papilloma virus as a 
vector is disclosed in U.S. 4,419,446. A modification of this system is 

10 described in U.S. 4,601,978. See also Gray et al . , Nature . 295:503-508 
(1982) on expressing cDNA encoding immune interferon in monkey cells; , 
Reyes et al., Nature , 297:598-601 (1982) on expression of human 0- 
interferon cDNA in mouse cells under the control of a thymidine kinase 
promoter from herpes simplex virus, Canaani and Berg, Proc. Natl. Acad. 

15 Sci. USA , 79:5166-5170 (1982) on expression of the human interferon 01 gene 
in cultured mouse and rabbit cells, and Gorman et al., Proc. Natl. Acad. 
Sci. USA , 79:6777-6781 (1982) on expression of bacterial CAT sequences in 
CV-1 monkey kidney cells, chicken embryo fibroblasts, Chinese hamster ovary 
cells, HeLa cells, and mouse NIH-3T3 cells using the Rous sarcoma virus 

20 long terminal repeat as a promoter. 

Preferably the transcriptional regulatory region in higher eukaryotes 
comprises an enhancer sequence. Enhancers are relatively orientation and 
position independent having been found 5' (Lainins et al . , Proc . Natl . 
Acad. Sci. USA , 78:993 [1981]) and 3' (Lusky et al., Mol . Cell Bio. , 3:1108 

25 {1983]) to the transcription unit, within an intron (Banerji et al.. Cell , 
33:729 [1983]) as well as within the coding sequence itself (Osborne et 
al. , Mol . Cell Bio. , 4:1293 [1984]) . Many enhancer sequences are now known 
from mammalian genes (globin, elastase, albumin, or- fetoprotein and 
insulin) . Typically, however, one will use an enhancer from a eukaryotic 

3 0 cell virus. Examples include the SV40 enhancer on the late side of the 

replication origin (bp 100-270) , the cytomegalovirus early promoter 
enhancer (CMV) , the polyoma enhancer on the late side of the replication 
origin, and adenovirus enhancers. See also Yaniv, Nature , 297:17-18 (1982) 
on enhancing elements for activation of eukaryotic promoters . The enhancer 

35 may be spliced into the vector at a position 5' or 3' to the product gene, 
but is preferably located at a site 5' from the promoter. 

The DNA construct has a transcriptional initiation site following the 
transcriptional regulatory region and a transcriptional termination region 
following the product gene (see Figure 1A) . These sequences are provided 

40 in the DNA construct using techniques which are well known in the art. 

The DNA construct normally forms part of an expression vector which 
may have other components such as an origin of replication (i.e., a nucleic 
acid sequence that enables the vector to replicate in one or more selected 
host cells) and, if desired, one or more additional selectable gene(s). 

4 5 Construction of suitable vectors containing the desired coding and control 

sequences employs standard ligation techniques. Isolated plasmids or DNA 
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fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. 

Generally, in cloning vectors the origin of replication is one that 
enables the vector to replicate independently of the host chromosomal DNA, 
5 and includes origins of replication or autonomously replicating sequences . 
Such sequences are well known. The 2ft plasmid origin of replication is 
suitable for yeast, and various viral origins (SV40, polyoma, adenovirus, 
VSV or BPV) are useful for cloning vectors in mammalian cells . Generally, 
the origin of replication component is not needed for mammalian expression 
10 vectors (the SV40 origin may typically be used only because it contains the 
early promoter) . 

Most expression vectors are "shuttle" vectors, i.e., they are capable 

of replication in at least one class of organisms but can be trans feet ed 

into another organism for expression. For example, a vector is cloned in 
15 E. coli and then the same vector is trans fee ted into yeast or mammalian 

cells for expression even though it is not capable of replicating 

independently of the host cell chromosome. 

For analysis to confirm correct sequences in plasmids constructed, 

plasmids from the transf ormants are prepared, analyzed by restriction, 
20 and/or sequenced by the method of Messing et al . , Nucleic Acids Res . , 9:309 

(1981) or by the method of Maxam et al., Methods in Enzymology , 65:499 

(1980) . 

The expression vector having the DNA construct prepared as discussed 
above is transformed into a eukaryotic host cell . Suitable host cells for 
25 cloning or expressing the vectors herein are yeast or higher eukaryote 
cells . 

Eukaryotic microbes such as filamentous fungi or yeast are suitable 
hosts for vectors containing the product gene. Saccharomyces cerevisiae, 
or common baker's yeast, is the most commonly used among lower eukaryotic 

30 host microorganisms. However, a number of other genera, species, and 
strains are commonly available and useful herein, such as S. pombe [Beach 
and Nurse, Nature, 290:140 (1981)], Kluyveromyces lactis [Louvencourt et 
al., J. Bacteriol. . 737 (1983)], yarrowia [EP 402,226], Pichia pas tori s [EP 
183,070], Trichoderma reesia [EP 244,234], Neurospora crassa [Case et al . , 

35 Proc. Natl. Acad. Sci . USA . 76:5259-5263 (1979)], and Aspergillus hosts 
such as A. nidulans [Ballance et al . , Biochem . Biophvs . Res . Commun . . 
112:284-289 (1983); Tilburn et al . , Gene . 26:205-221 (1983); Yelton et al . , 
Proc. Natl. Acad. Sci. USA . 81:1470-1474 (1984)] and A. niger [Kelly and 
Hynes, EMBO J. . 4:475-479 (1985)]. 

40 Suitable host* cells for the expression of the product gene are 

derived from multicellular organisms. Such host cells are capable of 
complex processing and glycosylation activities . In principle, any higher 
eukaryotic cell culture is workable, whether from vertebrate or 
invertebrate culture. Examples of invertebrate cells include plant and 

45 insect cells. Numerous baculoviral strains and variants and corresponding 
permissive insect host cells from hosts such as Spodoptera frugiperda 
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(caterpillar) , Aedes aegypti (mosquito) , Aedes albopictus (mosquito) , 
Drosphila melanogaster (fruit fly) , and Bombyx mori host cells have been 
identified. See, e.g., Luckow et al., Bio /Technology , 6:47-55 (1988); 
Miller et al., in Genetic Engineering . Setlow, J.K. et al . , eds., Vol. 8 
5 (Plenum Publishing, 1986), pp. 277-279; and Maeda et al . , Nature , 315:592- 
594 (1985) . A variety of such viral strains are publicly available, e.g., 
the L-l variant of Autographa californicalSPV and the Bm-5 strain of Bombyx 
mori NPV, and such viruses may be used as the virus herein according to the 
present invention, particularly for transfection of Spodoptera frugiperda 
10 cells. 

Plant cell cultures of cotton, corn, potato, soybean, petunia, 
tomato, and tobacco can be utilized as hosts. Typically, plant cells are 
trans fee ted by incubation with certain strains of the bacterium 
Agrobacterium tumefaciens, which has been previously manipulated to contain 

15 the product gene. During incubation of the plant cell culture with A. 
tumefaciens, the product gene is transferred to the plant cell host such 
that it is transfected, and will, under appropriate conditions, express the 
product gene. In addition, regulatory and signal sequences compatible with 
plant cells are available, such as the nopaline synthase promoter and 

20 polyadenylation signal sequences. Depicker et al . , J. Mol . Appl. Gen. , 
1:561 (1982) . In addition, DNA segments isolated from the upstream region 
of the T-DNA 750 gene are capable of activating or increasing transcription 
levels of plant-expressible genes in recombinant DNA- containing plant 
tissue. EP 321,196 published 21 June 1989. 

25 However, interest has been greatest in vertebrate cells, and 

propagation of vertebrate cells in culture (tissue culture) has become a 
routine procedure in recent years [ Tissue Culture , Academic Press, Kruse 
and Patterson, editors (1973)]. Examples of useful mammalian host cell 
lines are monkey kidney CV1 line transformed by SV40 (COS-7, ATCC CRL 

30 1651) ; human embryonic kidney line (293 or 293 cells subcloned for growth 
in suspension culture, Graham et al . , J. Gen Virol . , 36:59 [1977]); baby 
hamster kidney cells (BHK, ATCC CCL 10) ; Chinese hamster ovary cells/-DHFR 
(CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci . USA , 77:4216 [1980]); 
dpl2.CHO cells (EP 307,247 published 15 March 1989); mouse Sertoli cells 

35 (TM4, Mather, Biol. Reprod. . 23:243-251 [1980]); monkey kidney cells (CV1 
ATCC CCL 70) ; African green monkey kidney cells (VERO-76, ATCC CRL-1587) ; 
human cervical carcinoma cells (HELA, ATCC CCL 2) ; canine kidney cells 
(MDCK, ATCC CCL 34) ; buffalo rat liver cells (BRL 3A, ATCC CRL 1442) ; human 
lung cells (W138, ATCC CCL 75); human liver cells (Hep G2 , HB 8065); mouse 

40 mammary tumor (MMT 060562, ATCC CCL51) ; TRI cells (Mather et al . , Annals 
N.Y. Acad. Sci. , 383:44-68 [1982]); MRC 5 cells; PS 4 cells; and a human 
hepatoma line (Hep G2) . 

Host cells are transformed with the above -described expression or 
cloning vectors of this invention and cultured in conventional nutrient 

45 media modified as appropriate for inducing promoters, selecting 
transformants, or amplifying the genes encoding the desired sequences. 
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Infection with Agrobacterium tumefaciens is used for transformation 
of certain plant cells, as described by Shaw et al., Gene , 23:315 (1983) 
and WO 8 9/05859 published 29 June 1989. For mammalian cells without such 
cell walls, the calcium phosphate precipitation method of Graham and van 
5 der Eb, Virology . 52:456-457 (1978) may be used. General aspects of 
mammalian cell host system transformations have been described by Axel in 
U.S. 4,399,216 issued 16 August 1983. Transformations into yeast are 
typically carried out according to the method of Van Solingen ec al . , J. 
Bact. , 130:946 (1977) and Hsiao et al., Proc. Natl. Acad. Sci. (USA) , 

10 76:3829 (1979) . However, other methods for introducing DNA into cells such 
as by nuclear injection or by protoplast fusion may also be used. 

In the preferred embodiment the DNA is introduced into the host cells 
using electroporation. See Andreason, J. Tiss. Cult. Meth. , 15:56-62 
(1993) , for a review of electroporation techniques useful for practicing 

15 the instantly claimed invention. It was discovered that electroporation 
techniques for introducing the DNA construct into the host cells were 
preferable over calcium phosphate precipitation techniques insofar as the 
latter could cause the DNA to break up and forming concantemers . 

The mammalian host cells used to express the product gene herein 

20 may be cultured in a variety of media as discussed in the definitions 
section above . The media contains the selection agent used for selecting 
transformed host cells which have taken up the DNA construct (either as an 
intra- or extra -chromosomal element) . To achieve selection of the 
transformed eukaryotic cells, the host cells may be grown in cell culture 

25 plates and individual colonies expressing the selectable gene (and thus the 
product gene) can be isolated and grown in growth medium until the 
nutrients are depleted. The host cells are then analyzed for transcription 
and/or transformation as discussed below. The culture conditions, such as 
temperature, pH, and the like, are those previously used with the host cell 

30 selected for expression, and will be apparent to the ordinarily skilled 
artisan. 

Gene amplification and/or expression may be measured in a sample 
directly, for example, by conventional Southern blotting, Northern blotting 
to quantitate the transcription of mRNA (Thomas, Proc. Natl. Acad. Sci. 

35 USA, 77:5201-5205 [1980]), dot blotting (DNA analysis), or in situ 
hybridization, using an appropriately labeled probe, based on the sequences 
provided herein. Various labels may be employed, most commonly 
radioisotopes, particularly "P. However, other techniques may also be 
employed, such as using biot in-modified nucleotides for introduction into 

40 a polynucleotide. The biotin then serves as the site for binding to avidin 
or antibodies, which may be labeled with a wide variety of labels, such as 
radionuclides, fluorescens, enzymes, or the like. Alternatively, 
antibodies may be employed that can recognize specific duplexes, including 
DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA-protein 

45 duplexes. The antibodies in turn may be labeled and the assay may be 
carried out where the duplex is bound to a surface, so that upon the 
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formation of duplex on the surface, the presence of antibody bound to the 
duplex can be detected. 

Gene expression, alternatively, may be measured by immunological 
methods, such as immunohistochemical staining of tissue sections and assay 
5 of cell culture or body fluids, to quantitate directly the expression of 
gene product. With immunohistochemical staining techniques, a cell sample 
is prepared, typically by dehydration and fixation, followed by reaction 
with labeled antibodies specific for the gene product coupled, where the 
labels are usually visually detectable, such as enzymatic labels, 

10 fluorescent labels, luminescent labels, and the like. A particularly 
sensitive staining technique suitable for use in the present invention is 
described by Hsu et al., Am. J. Clin. Path. , 75:734-738 (1980). 

In the preferred embodiment, the mRNA is analyzed by quantitative PCR 
(to determine the efficiency of splicing) and protein expression is 

15 measured using ELISA as described in Example 1 herein. 

The product of interest preferably is recovered from the culture 
medium as a secreted polypeptide, although it also may be recovered from 
host cell lysates when directly expressed without a secretory signal . When 
the product gene is expressed in a recombinant cell other than one of human 

20 origin, the product of interest is completely free of proteins or 
polypeptides of human origin. However, it is necessary to purify the 
product of interest from recombinant cell proteins or polypeptides to 
obtain preparations that are substantially homogeneous as to the product 
of interest. As a first step, the culture medium or lysate is centrifuged 

25 to remove particulate cell debris. The product of interest thereafter is 
purified from contaminant soluble proteins and polypeptides, for example, 
by fractionation on immunoaf f inity or ion-exchange columns; ethanol 
precipitation; reverse phase HPLC; chromatography on silica or on a cation 
exchange resin such as DEAE; chromatof ocusing; SDS-PAGE; ammonium sulfate 

30 precipitation; gel electrophoresis using, for example, Sephadex G-75; 
chromatography on plasminogen columns to bind the product of interest and 
protein A Sepharose columns to remove contaminants such as IgG . 

The following examples are offered by way of illustration only and 
are not intended to limit the invention in any manner. All patent and 

35 literature references cited herein are expressly incorporated by reference. 

EXAMPLE 1 

tPA production using the dicistronic expression vectors 
It was sought to increase the level of homogeneity with regard to 
expression levels of stable clones by expressing a selectable marker (such 
40 as DHFR) and the protein of interest from a single promoter. These vectors 
divert most of the transcript to product expression while linking it at a 
fixed ratio to DHFR expression via differential splicing. 

Vectors were constructed which were derived from the vector pRK (Suva 
et al., Science , 237:893-896 [1987]) which contains an intron between the 
45 cytomegalovirus immediate early promoter (CMV) and the cDNA that encodes 
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the polypeptide of interest. The intron of pRK is 139 nucleotides in 
length, has a splice donor site derived from cytomegalovirus immediate 
early gene (CMVIE) , and a splice acceptor site from an IgG heavy chain 
variable region (V H ) gene (Eaton et al . , Biochem. , 25:8343 [1986]). 
5 DHFR/intron vectors were constructed by inserting an EcoRV linker 

into the BSTX1 site present in the intron of pRK7 . An 830 base-pair 
fragment containing a mouse DHFR coding fragment was inserted to obtain 
DHFR intron expression vectors which differ only in the sequence that 
comprises the splice donor site. Those sequences were altered by 

10 overlapping PCR mutagenesis to obtain sequences that match splice donor 
sites found between exons 3 and 4 of normal and mutant Ras genes. PCR was 
also used to destroy the splice donor site. 

A mouse DHFR cDNA fragment (Simonsen et al . , Proc. Natl. Acad. Sci . 
USA, 80:2495-2499 [1983]) was inserted into the intron of this vector 59 

15 nucleotides downstream of the splice donor site. The splice donor site of 
this vector was altered by mutagenesis to change the ratio of spliced to 
non-spliced message in transfected cells. It has previously been shown 
that a single nucleotide change (G to A) converted a relatively efficient 
splice donor site found in the normal ras gene into an inefficient splice 

20 site (Cohen et al . , Nature , 334:119-124 [1988]). This effect has been 
demonstrated in the context of the ras gene and confirmed when these 
sequences were transferred to human growth hormone constructs (Cohen et 
al., Cell , 58:461-472 [1989]). Additionally, a non functional 5' splice 
site (GT to CA) was constructed as a control UGT) . A polylinker was 

25 inserted 35 nucleotides downstream of the 3' splice site to accept the cDNA 
of interest. A vector containing t PA (Pennica et al., Nature , 301:214-221 
[1983]) was linearized downstream of the polyadenylat ion site before it was 
introduced into CHO cells (Potter et al . , Proc. Natl. Acad. Sci. USA . 
81:7161 [1984] ) . 

30 Plasmid DNA's that contained DHFR/intron, tPA and (a) wild type ras 

(WT ras), i.e. Figure 3 (SEQ ID NO: 1), (b) mutant ras, or (c) non- 
functional splice donor site UGT) were introduced into CHO DHFR minus 
cells by electroporation. The intron vectors were each linearized 
downstream of the polyadenylat ion site by restriction endonuclease 

35 treatment. The control vector was linearized downstream of the second 
polyadenylation site. The DNA's were ethanol precipitated after 
phenol /chloroform extraction and were resuspended in 20^1 1/10 Tris EDTA. 
Then, 10/ig of DNA was incubated with 10 7 CHO.dpl2 cells (EP 307,247 
published 15 March 1989) in 1 ml of PBS on ice for 10 min. before 

40 electroporation at 400 volts and.330pf using a BRL Cell Porator. 

Cells were returned to ice for 10 min. before being plated into non- 
selective medium. After 24 hours cells were fed nucleoside- free medium to 
select for stable DHFR+ clones which were pooled. The pooled DHFR+ clones 
were lysed and mRNA's were prepared. 

45 To prepare the mRNA, RNA was extracted from 5 x 10 7 cells which were 

grown from pools of more than 200 clones derived from the stable 
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transf ection of the three vectors, the essential construction of which is 
shown in Figure IB and from non- transf ected CHO cells. RNA was purified 
over oligo-DT cellulase {Collaborative Biomedical Products) . lOjjg of mRNA 
was then subjected to Northern blotting which involved running the mRNA on 
5 a 1.2% agarose, 6.6% formaldehyde gel, and transferring it to a nylon 
filter (Stratagene Duralon-UV membrane) , prehybridized, probed and washed 
according to the manufacturer' s instructions . 

The filter was probed sequentially using probes (shown in Figure IB) 
that would detect (a) the full length message, (b) both full length and 

10 spliced message, or (c) beta act in. Probing with the long probe showed 
that the vector that contains the efficient splice donor site (i.e. WT ras) 
generates predominately a mRNA of the size predicted for the spliced 
product while the other two vectors gave rise primarily to a mRNA that 
corresponds in size to non- spliced message. The DHFR probe detected only 

15 full length message and demonstrated that the WT ras splice donor derived 
vector generates very little full length message with which to confer a 
DHFR positive phenotype. 

Figure 4 shows the number of DHFR positive colonies obtained after 
duplicate electroporations with the three intron vectors described above 

20 and from a conventional vector that has a CMV promoter driving tPA and a 
SV40 promoter driving DHFR (see Figure 2) . The increase in colony number 
parallels the increase in full length message that accumulates with the 
modification of the splice donor sites. The conventional vector 
efficiently generates colonies and does not vary significantly from the aGT 

25 construct. 

The level of tPA expression was determined by seeding cells in 1 ml 
of F12 : DMEM (50:50, with 5% FBS) in 24 well dishes to near confluency. 
Growth of the cells continued until the media was exhausted. Media was 
then assayed by ELISA for tPA production. Briefly, anti-tPA antibody was 
30 coated onto the wells of an ELISA microtiter plate, media samples were 
added to the wells followed by washing. Binding of the antigen (tPA) was 
then quantified using horse radish peroxidase (HRPO) labelled anti-tPA 
antibody. 

Figure 5A depicts the titers of secreted tPA protein after pooling 
35 the clones of each group shown in Figure 4. While the number of colonies 
increased with a weakening of splice donor function, the inverse was seen 
with respect to tPA expression. The expression levels are consistent with 
the RNA products that are observed; as more of the dicistronic message is 
spliced an increased amount of message will contain tPA as the first open 
4 0 reading frame resulting in increased tPA expression. A mutation of GT to 
CA in the splice donor site results in an abundance of DHFR positive 
colonies which express undetectable levels of tPA, possibly resulting from 
inefficient utilization of the second AUG. Importantly, Figure 5A also 
shows that expression levels obtained from one of the dicistronic vectors 
45 (with WT ras SD) was about threefold higher than that obtained with the 
control vector containing a CMV promoter /enhancer driving tPA, SV40 
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promoter/enhancer controlling DHFR and SV40 polyadenylation signals 
controlling the expression of tPA and DHFR. 

Additionally, the homogeneity of expression in the pools was 
investigated. Figure 5B shows that all 20 clones generated by the WT ras 
5 splice donor site derived dicistronic vectors express detectable levels of 
tPA while only 4 of 20 clones generated by the control vector express tPA. 
None of the clones transf ected with the non-splicing UGT) vector expressed 
tPA levels detectable by BLISA. This finding is consistent with previous 
observations that relatively few clones generated by conventional vectors 

10 make useful levels of protein. 

Expression of tPA was increased following methotrexate amplification 
of pools. Figure 5C shows that 2 of the dicistronic vector derived pools 
(i.e. with WT ras and MUTANT ras SD sites) increased in expression markedly 
(8.4 and 7.7 fold), while the pool generated by the conventional vector 

15 increased only slightly (2.8 fold) when each was subjected to 200 nM Mtx. 
An overall increase of 9 fold was obtained using the best dicistronic (WT 
ras SD) versus the conventional vector following amplification. Growth of 
the highest expressing amplified pool in nutrient rich production medium 
yielded titers of 4.2 itg/ml tPA. 

20 It was shown that manipulation of the splice donor sequence alters 

the ratio of spliced to full length message and the number of colonies that 
form in selective medium. It was also shown that dicistronic expression 
vectors generate clones that express high levels of recombinant proteins. 
Surprisingly, it was possible to isolate high expressors which had the 

25 efficient WT ras splice donor site by selection for DHFR* cells despite the 
efficiency with which the DHFR gene was spliced from the RNA precursors 
formed in these cells. 

PXA^P)L,E 3 

TNFr-IgG production using the dicistronic expression vectors 
30 To prove the general applicability of this approach, a second product 

was evaluated in the dicistronic vector system containing, as the DNA of 
interest, an immunoadhesin (TNFr-IgG) capable of binding tumor necrosis 
factor (TNF) (Ashkenazi et al. , Proc. Natl. Acad. Sci. USA , 88:10535-10539 
[1991] ) . The experiments described in Example 1 above were essentially 
35 repeated except that the product gene encoded the immunoadhesin TNFr-IgG. 
Plasmid DNA' s that contained a TNFr-IgG cDNA and (a) WT ras, i.e. Figure 
6 (SEQ ID NO: 2) , (b) mutant ras or (c) nonfunctional splice donor site 
( AGT) were introduced into the dpl2.CH0 cells as discussed for Example 1. 
See Figure 1C for an illustration of the DNA constructs. 
40 It was discovered that the number of DHFR positive colonies generated 

by three of these vectors was similar to that seen with the tPA constructs. 
Expression of TNFr-IgG also paralleled that seen with the tPA constructs 
(Figure 7A) . Amplification of pools from two of the constructs showed a 
marked increase in expression of immunoadhesin (9.6 and 6.8 fold) (Figure 
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7B) . The best of these amplified pools expressed 9.5 pg/ml when grown in 
nutrient rich production medium. 

Thus, it was again shown that dicistronic expression vectors generate 
clones that express high levels of recombinant proteins. Furthermore, 
5 contrary to expectations, it was discovered that isolation of high product 
expressing host DHFR* cells was possible using an efficient splice donor 
site (i.e. the WT ras splice donor site) . 

EXAMPLE 3 

Antibody production using a dicistronic expression vector 

10 The usefulness of this system for antibody expression was evaluated 

by testing production of an antibody directed against IgE (Presta et al.. 
Journal of Immunology, 151:2623-2632 [1993]). Further, the flexibility of 
the system with regard to transcription initiation was tested by replacing 
the CMV promoter/enhancer present in the previous vectors with the 

15 promoter/ enhancer derived from the early region of SV40 virus (Griffin, 
B., Structure and Genomic Organization of SV40 and Polyoma Virus, In J. 
Tooze [Ed] DNA Tumor Viruses, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, New York) . The heavy chain of the antibody was inserted downstream 
of DHFR as described in the earlier tPA and TNFr-IgG constructs. 

20 Additionally, a new splice donor site sequence ( GAC : GTAAGT ) was engineered 
into the vector which matches the consensus splice donor site more closely 
than did the splice donor sites present in the vectors tested in Examples 
1 and 2. The resultant expression vector is shown in Figures ID and 9. 

It was discovered that this vector produced fewer colonies than the 

25 vectors previously tested, and produced predominantly a spliced RNA 
product. A second vector was constructed to have the light chain of the 
antibody under control of the SV40 promoter /enhancer and poly- A and the 
hygromycin B resistance gene under control of the CMV promoter /enhancer and 
SV40 poly-A. These vectors were linearized at unique Hpal sites downstream 

30 of the poly-A signal, mixed at a ratio of light chain vector to heavy chain 
vector of 10:3 and electroporated into CHO cells using an optimized 
protocol (as discussed in Examples 1 and 2) . 

Figure 11 shows the levels of antibody expressed by clones and pools 
after selection in hygromycin B followed by selection for DHFR expression. 

35 All 20 of the clones analyzed expressed high levels of antibody when grown 
in rich medium and varied from one another by only a factor of four. A 
pool of antibody producing clones was generated and assayed shortly after 
it was established. That pool was grown continuously for 6 weeks without 
a significant decrease in productivity demonstrating that its stability was 

40 sufficient to generate gram quantities of protein from its large scale 
culture. 

The pool was subjected to methotrexate amplification at 200nM and ljxM 
and achieved a greater than 2 fold increase in antibody titer. The lj-iM Mtx 
resistant pool achieved a titer of 41 mg/L when grown under optimal 
45 conditions in suspension culture . 
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The structure of the expressed antibody was examined. Proteins 
expressed by the 200nM methotrexate resistant pool and by a well 
characterized expression clone generated by conventional vectors (Presta 
et al. [1993], supra) were metabolically labeled with S" cysteine and 
5 methionine. In particular, confluent 35mm plates of cells were 
metabolically labeled with 50/xCi each S-35 methionine and S-35 cysteine 
(Amersham) in serum free cysteine and methionine free F12:DMEM. After one 
hour, nutrient rich production media was added and labeled proteins were 
allowed to "chase" into the medium for six more hours. Proteins were run 

10 on a 12% SDS/PAGE gel (NOVEX) non-reduced or following reduction with B- 
mercaptoethanol . Dried gels were exposed to film for 16 hours. CHO 
control cells were also labeled. 

The majority of the antibody protein is secreted with a molecular 
weight of about 155 kilodaltons, consistent with a properly disulfide- 

15 linked antibody molecule with 2 light and 2 heavy chains. Upon reduction 
the molecular weight shifts to 2 approximately equally abundant proteins 
of 22.5 and 55 kilodaltons. The protein generated from the pool is 
indistinguishable from the antibody produced by the well characterized 
expression clone, with no apparent increase of free heavy or light chain 

20 expressed by the pool. 

CONCLUSION 

The efficient expression system described herein utilizes vectors 
consisting of promoter /enhancer elements followed by an intron containing 
the selectable marker coding sequence, followed by the cDNA of interest and 

25 a polyadenylat ion signal. 

Several splice donor site sequences were tested for their effect on 
colony number and expression of the cDNA of interest. A non- functional 
splice donor site, splice donor sites found in an intron between exons 3 
and 4 of mutant (mutant ras) and normal (WT ras) forms of the Harvey Ras 

30 gene and another efficient SD site (see Example 3) were used. The vectors 
were designed to direct expression of dicistronic primary transcripts. 
Within a trans fected cell some of the transcripts remain full length while 
the remainder are spliced to excise the DHFR coding sequence. When the 
splice donor site is weakened or destroyed an increase in colony number 

35 is observed. 

Expression levels show the inverse pattern, with the most efficient 
splice donor sites generating the highest levels of tPA, TNFr immunoadhesin 
or anti-IgE V H . 

The homogeneity of expression of clones generated by the ras splice 
40 donor site intron DHFR vectors was compared to clones generated from a 
conventional vector with a separate promoter /enhancer and polyadenylat ion 
signal for each DHFR and tPA. The DHFR intron vector gives rise to 
colonies that are much more homogeneous with regard to expression than 
those generated by the conventional vector. Non- expressing clones derived 
4 5 from the conventional vector may be the result of breaks in the tPA or 
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TNFr-IgG domain of the plasmid during integration into the genome or the 
result of methylation of promoter elements (Busslinger et al . , Cell , 
34:197-206 [1983]; Watt et al . , Genes and Development , 2:1136-1143 [1988]) 
driving tPA or TNFr-IgG expression. Promoter silencing by methylation or 
5 breaks in the DHFR-intron vectors would very likely render them incapable 
of conferring a DHFR positive phenotype. 

It was found that pools generated by the DHFR-intron vectors could 
be amplified in methotrexate and would increase in expression by a factor 
of 8.4 (tPA) , or 9.8 (TNFr-IgG). Pools from conventional vectors increased 

10 by only 2.8 and 3.0 fold for tPA and TNFr-IgG when amplified similarly. 
Amplified pools resulted in 9 fold higher tPA levels and 15 fold higher 
TNFr-IgG levels when compared to the conventional vector amplified pools. 

Without being limited to any theory, the increase in expression of 
methotrexate resistant pools derived from the dicistronic vectors is likely 

15 due to the transcriptional linkage of DHFR and the product; when cells are 
selected for increased DHFR expression they consistently over-express 
product. Conventional approaches lack selectable marker and cDNA 
expression linkage and therefore methotrexate amplification of ten generates 
DHFR overexpress ion without the concomitant increase in product expression. 

20 A further increase of 4 and 6.3 fold in expression were obtained when 

amplified tPA and TNFr-IgG pools were transferred from the media used for 
the selections and amplifications to a nutrient rich production medium. 

In Example 3, the expression vector had a splice donor site that more 
closely matches the consensus splice donor sequence and had the heavy chain 

25 of a humanized anti-IgE antibody inserted downstream. This vector was 
linearized and co-electroporated with a second linearized vector that 
expresses the hygromycin resistance gene and the light chain of the 
antibody each under the control of its own promoter /enhancer and poly-A 
signals. An excess of light chain expression vector over the heavy chain 

30 dicistronic expression vector was used to bias in favor of light chain 
expression. Clones and a pool were generated after hygromycin B and DHFR 
selections. The clones were found to express relatively consistent, high 
levels of antibody, as did the pool. The lj/M pool achieved a titer of 
41mg/L when grown under optimal conditions in suspension culture. 

35 The anti-IgE antibody was assessed by metabolic labeling followed by 

SDS/PAGE under reducing and non reducing conditions and found to be 
indistinguishable from the protein expressed by a highly characterized 
clonal cell line. Of particular importance is the finding that no free 
light chain is observed in the pool relative to the clone. 

40 A stable expression system for CHO . cells has been developed that 

produces high levels of recombinant proteins rapidly and with less effort 
than that required by other expression systems. The vector system 
generates stable clones that express consistently high levels thereby 
reducing the number of clones that must be screened to obtain a highly 

45 productive clonal line. Alternatively, pools have been used to 
conveniently generate moderate to high levels of protein. This approach 
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may be particularly useful when a number of related proteins are to be 
expressed and compared. 

Without being limited to this theory, it is possible the vectors that 
have very efficient splice donor sites generate very productive clones 
5 because so little transcript remains non spliced that only integration 
events that lead to the generation of high levels of RNA produce enough 
DHFR protein to give rise to colonies in selective medium. The high level 
of spliced message from such clones is then translated into abundant 
amounts of the protein of interest. Pools of clones made concurrently by 
10 introducing conventional vectors expressed lower levels of protein, and 
were unstable with regard to long term expression, and expression could not 
be appreciably increased when the cells were subjected to methotrexate 
amplification. 

The system developed herein is versatile in that it allows high 
15 levels of single and multiple subunit polypeptides to be rapidly generated 
from clones or pools of stable transf ectants . This expression system 
combines the advantages of transient expression systems (rapid and labor 
non intensive generation of research amounts of protein) with the 
concurrent development of highly productive stable production cell lines . 
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TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 



TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 



TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 



ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 



TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
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ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 
TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 650 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 
TTGGAACGCG GATTCCCCGT GCCAAGAGTG CTGTAAGTAC CGCCTATAGA 750 
GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT 800 
GCATCGTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGGAGACCTA 850 
CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA GAATGACCAC 900 
AACCTCTTCA GTGGAAGGTA AACAGAATCT GGTGATTATG GGTAGGAAAA 950 
CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 
ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 1050 
TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 1100 
CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 1150 
GAAGCCATGA ATCAACCAGG CCACCTTAGA CTCTTTGTGA CAAGGATCAT 1200 
GCAGGAATTT GAAAGTGACA CGTTTTTCCC AGAAATTGAT TTGGGGAAAT 1250 
ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 1300 
AAAGGCATCA AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 1350 
AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 1400 
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GACCATGGGA CTTTTGCTGG CTTTAGACCC 
GGCTACAATT AATACATAAC CTTATGTATC 
CACTATAGAA TAACATCCAC TTTGCCTTTC 
GGTCAACTGC ACCTCGGTTC TAAGCTTGGG 
AAGGGACGCT GTGAAGCAAT CATGGATGCA 
TGTGCTGCTG CTGTGTGGAG CAGTCTTCGT 
ATGCCCGATT CAGAAGAGGA GCCAGATCTT 
GAAAAAACGC AGATGATATA CCAGCAACAT 
GCTCAGAAGC AACCGGGTGG AATATTGCTG 
AGTGCCACTC AGTGCCTGTC AAAAGTTGCA 
GGGGGCACCT GCCAGCAGGC CCTGTACTTC 
CCCCGAAGGA TTTGCTGGGA AGTGCTGTGA 
GCTACGAGGA CCAGGGCATC AGCTACAGGG 
AGTGGCGCCG AGTGCACCAA CTGGAACAGC 
CTACAGCGGG CGGAGGCCAG ACGCCATCAG 
ACTACTGCAG AAACCCAGAT CGAGACTCAA 
AAGGCGGGGA AGTACAGCTC AGAGTTCTGC 
GGGAAACAGT GACTGCTACT TTGGGAATGG 
ACAGCCTCAC CGAGTCGGGT GCCTCCTGCC 
CTGATAGGCA AGGTTTACAC AGCACAGAAC 
CCTGGGCAAA CATAATTACT GCCGGAATCC 
GGTGCCACGT GCTGAAGAAC CGCAGGCTGA 
CCCTCCTGCT CCACCTGCGG CCTGAGACAG 



CCTTGGCTTC GTTAGAACGC 1450 
ATACACATAG ATTTAGGTGA 1500 
TCTCCACAGG TGTCACTCCA 1550 
CTGCAGGTCG CCGTGAATTT 1600 
ATGAAGAGAG GGCTCTGCTG 1650 
TTCGCCCAGC CAGGAAATCC 1700 
ACCAAGTGAT CTGCAGAGAT 1750 
CAGTCATGGC TGCGCCCTGT 1800 
GTGCAACAGT GGCAGGGCAC 1850 
GCGAGCCAAG GTGTTTCAAC 1900 
TCAGATTTCG TGTGCCAGTG 1950 
AATAGATACC AGGGCCACGT 2000 
GCACGTGGAG CACAGCGGAG 2050 
AGCGCGTTGG CCCAGAAGCC 2100 
GCTGGGCCTG GGGAACCACA 2150 
AGCCCTGGTG CTACGTCTTT 2200 
AGCACCCCTG CCTGCTCTGA 2250 
GTCAGCCTAC CGTGGCACGC 2300 
TCCCGTGGAA TTCCATGATC 2350 
CCCAGTGCCC AGGCACTGGG 2400 
TGATGGGGAT GCCAAGCCCT 2450 
CGTGGGAGTA CTGTGATGTG 2500 
TACAGCCAGC CTCAGTTTCG 2550 

-31- 



WO 96/04391 PCIYUS9S/09576 

CATCAAAGGA GGGCTCTTCG CCGACATCGC CTCCCACCCC TGGCAGGCTG 2600 



CCATCTTTGC CAAGCACAGG AGGTCGCCCG GAGAGCGGTT CCTGTGCGGG 2650 

5 

GGCATACTCA TCAGCTCCTG CTGGATTCTC TCTGCCGCCC ACTGCTTCCA 2700 
10 GGAGAGGTTT CCGCCCCACC ACCTGACGGT GATCTTGGGC AGAACATACC 2750 
GGGTGGTCCC TGGCGAGGAG GAGCAGAAAT TTGAAGTCGA AAAATACATT 2800 

15 

GTCCATAAGG AATTCGATGA TGACACTTAC GACAATGACA TTGCGCTGCT 2850 
GCAGCTGAAA TCGGATTCGT CCCGCTGTGC CCAGGAGAGC AGCGTGGTCC 2900 

20 

GCACTGTGTG CCTTCCCCCG GCGGACCTGC AGCTGCCGGA CTGGACGGAG 2950 
25 TGTGAGCTCT CCGGCTACGG CAAGCATGAG GCCTTGTCTC CTTTCTATTC 3000 
GGAGCGGCTG AAGGAGGCTC ATGTCAGACT GTACCCATCC AGCCGCTGCA 3050 

30 

CATCACAACA TTTACTTAAC AGAACAGTCA CCGACAACAT GCTGTGTGCT 3100 
GGAGACACTC GGAGCGGCGG GCCCCAGGCA AACTTGCACG ACGCCTGCCA 3150 

35 

GGGCGATTCG GGAGGCCCCC TGGTGTGTCT GAACGATGGC CGCATGACTT 3200 
40 TGGTGGGCAT CATCAGCTGG GGCCTGGGCT GTGGACAGAA GGATGTCCCG 3250 
GGTGTGTACA CCAAGGTTAC CAACTACCTA GACTGGATTC GTGACAACAT 3300 

45 

GCGACCGTGA CCAGGAACAC CCGACTCCTC AAAAGCAAAT GAGATCCCGC 3350 
CTCTTCTTCT TCAGAAGACA CTGCAAAGGC GCAGTGCTTC TCTACAGACT 3400 

50 

TCTCCAGACC CACCACACCG CAGAAGCGGG ACGAGACCCT ACAGGAGAGG 3450 
55 GAAGAGTGCA TTTTCCCAGA TACTTCCCAT TTTGGAAGTT TTCAGGACTT 3500 
GGTCTGATTT CAGGATACTC TGTCAGATGG GAAGACATGA ATGCACACTA 3550 

60 

GCCTCTCCAG GAATGCCTCC TCCCTGGGCA GAAGTGGGGG GAATTCAATC 3600 
GATGGCCGCC ATGGCCCAAC TTGTTTATTG CAGCTTATAA TGGTTACAAA 3650 

65 

TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT TTTCACTGCA 3700 
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TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 3750 
TCGATCGGGA ATTAATTCGG CGCAGCACCA TGGCCTGAAA TAACCTCTGA 3800 
AAGAGGAACT TGGTTAGGTA CCTTCTGAGG CGGAAAGAAC CAGCTGTGGA 3850 
ATGTGTGTCA GTTAGGGTGT GGAAAGTCCC CAGGCTCCCC AGCAGGCAGA 3900 
AGTATGCAAA GCATGCATCT CAATTAGTCA GCAACCAGGT GTGGAAAGTC 3950 
CCCAGGCTCC CCAGCAGGCA GAAGTATGCA AAGCATGCAT CTCAATTAGT 4000 
CAGCAACCAT AGTCCCGCCC CTAACTCCGC CCATCCCGCC CCTAACTCCG 4050 
CCCAGTTCCG CCCATTCTCC GCCCCATGGC TGACTAATTT TTTTTATTTA 4100 
TGCAGAGGCC GAGGCCGCCT CGGCCTCTGA GCTATTCCAG AAGTAGTGAG 4150 
GAGGCTTTTT TGGAGGCCTA GGCTTTTGCA AAAAGCTGTT AACAGCTTGG 4200 
CACTGGCCGT CGTTTTACAA CGTCGTGACT GGGAAAACCC TGGCGTTACC 4250 
CAACTTAATC GCCTTGCAGC ACATCCCCCC TTCGCCAGCT GGCGTAATAG 4300 
CGAAGAGGCC CGCACCGATC GCCCTTCCCA ACAGTTGCGT AGCCTGAATG 4350 
GCGAATGGCG CCTGATGCGG TATTTTCTCC TTACGCATCT GTGCGGTATT 4400 
TCACACCGCA TACGTCAAAG CAACCATAGT ACGCGCCCTG TAGCGGCGCA 4450 
TTAAGCGCGG CGGGTGTGGT GGTTACGCGC AGCGTGACCG CTACACTTGC 4500 
CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT CTTCCCTTCC TTTCTCGCCA 4550 
CGTTCGCCGG CTTTCCCCGT CAAGCTCTAA ATCGGGGGCT CCCTTTAGGG 4600 
TTCCGATTTA GTGCTTTACG GCACCTCGAC CCCAAAAAAC TTGATTTGGG 4650 
TGATGGTTCA CGTAGTGGGC CATCGCCCTG ATAGACGGTT TTTCGCCCTT 4700 
TGACGTTGGA GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA 4750 
ACAACACTCA ACCCTATCTC GGGCTATTCT TTTGATTTAT AAGGGATTTT 4 800 
GCCGATTTCG GCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA 4850 



WO 96/04391 PCMJS95/09576 
ACGCGAATTT TAACAAAATA TTAACGTTTA CAATTTTATG GTGCACTCTC 4900 
AGTACAATCT GCTCTGATGC CGCATAGTTA AGCCAACTCC GCTATCGCTA 4 950 

5 

CGTGACTGGG TCATGGCTGC GCCCCGACAC CCGCCAACAC CCGCTGACGC 5000 
10 GCCCTGACGG GCTTGTCTGC TCCCGGCATC CGCTTACAGA CAAGCTGTGA 5050 
CCGTCTCCGG GAGCTGCATG TGTCAGAGGT TTTCACCGTC ATCACCGAAA 5100 

15 

CGCGCGAGGC AGTATTCTTG AAGACGAAAG GGCCTCGTGA TACGCCTATT 5150 
TTTATAGGTT AATGTCATGA TAATAATGGT TTCTTAGACG TCAGGTGGCA 5200 

20 

CTTTTCGGGG AAATGTGCGC GGAACCCCTA TTTGTTTATT TTTCTAAATA 5250 
25 CATTCAAATA TGTATCCGCT CATGAGACAA TAACCCTGAT AAATGCTTCA 5300 
ATAATATTGA AAAAGGAAGA GTATGAGTAT TCAACATTTC CGTGTCGCCC 5350 

30 

TTATTCCCTT TTTTGCGGCA TTTTGCCTTC CTGTTTTTGC TCACCCAGAA 5400 
ACGCTGGTGA AAGTAAAAGA TGCTGAAGAT CAGTTGGGTG CACGAGTGGG 5450 

35 

TTACATCGAA CTGGATCTCA ACAGCGGTAA GATCCTTGAG AGTTTTCGCC 5500 
40 CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT GCTATGTGGC 5550 
GCGGTATTAT CCCGTGATGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT 5600 

45 

ACACTATTCT CAGAATGACT TGGTTGAGTA CTCACCAGTC ACAGAAAAGC 5650 
ATCTTACGGA TGGCATGACA GTAAGAGAAT TATGCAGTGC TGCCATAACC 5700 

50 

ATGAGTGATA ACACTGCGGC CAACTTACTT CTGACAACGA TCGGAGGACC 5750 
55 GAAGGAGCTA ACCGCTTTTT TGCACAACAT GGGGGATCAT GTAACTCGCC 5800 
TTGATCGTTG GGAACCGGAG CTGAATGAAG CCATACCAAA CGACGAGCGT 5850 

60 

GACACCACGA TGCCAGCAGC AATGGCAACA ACGTTGCGCA AACTATTAAC 5900 
TGGCGAACTA CTTACTCTAG CTTCCCGGCA ACAATTAATA GACTGGATGG 5950 

65 

AGGCGGATAA AGTTGCAGGA CCACTTCTGC GCTCGGCCCT TCCGGCTGGC 6000 



-34- 



TGGTTTATTG CTGATAAATC TGGAGCCGGT 
CATTGCAGCA CTGGGGCCAG ATGGTAAGCC 
ACACGACGGG GAGTCAGGCA ACTATGGATG 
GAGATAGGTG CCTCACTGAT TAAGCATTGG 
CTCATATATA CTTTAGATTG ATTTAAAACT 
TCTAGGTGAA GATCCTTTTT GATAATCTCA 
GAGTTTTCGT TCCACTGAGC GTCAGACCCC 
TTCTTGAGAT CCTTTTTTTC TGCGCGTAAT 
AACCACCGCT ACCAGCGGTG GTTTGTTTGC 
CTTTTTCCGA AGGTAACTGG CTTCAGCAGA 
CCTTCTAGTG TAGCCGTAGT TAGGCCACCA 
CGCCTACATA CCTCGCTCTG CTAATCCTGT 
GGCGATAAGT CGTGTCTTAC CGGGTTGGAC 
TAAGGCGCAG CGGTCGGGCT GAACGGGGGG 
TGGAGCGAAC GACCTACACC GAACTGAGAT 
GAAAGCGCCA CGCTTCCCGA AGGGAGAAAG 
CGGCAGGGTC GGAACAGGAG AGCGCACGAG 
CCTGGTATCT TTATAGTCCT GTCGGGTTTC 
CGATTTTTGT GATGCTCGTC AGGGGGGCGG 
CAACGCGGCC TTTTTACGGT TCCTGGCCTT 
TGTTCTTTCC TGCGTTATCC CCTGATTCTG 
TTTGAGTGAG CTGATACCGC TCGCCGCAGC 
GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC 



GAGCGTGGGT CTCGCGGTAT 6050 
CTCCCGTATC GTAGTTATCT 6100 
AACGAAATAG ACAGATCGCT 6X50 
TAACTGTCAG ACCAAGTTTA 6200 
TCATTTTTAA TTTAAAAGGA 6250 
TGACCAAAAT CCCTTAACGT 6300 
GTAGAAAAGA TCAAAGGATC 6350 
CTGCTGCTTG CAAACAAAAA 6400 
CGGATCAAGA GCTACCAACT 6450 
GCGCAGATAC CAAATACTGT 6500 
CTTCAAGAAC TCTGTAGCAC 6550 
TACCAGTGGC TGCTGCCAGT 6600 
TCAAGACGAT AGTTACCGGA 6650 
TTCGTGCACA CAGCCCAGCT 6700 
ACCTACAGCG TGAGCATTGA 6750 
GCGGACAGGT ATCCGGTAAG 6800 
GGAGCTTCCA GGGGGAAACG 6850 
GCCACCTCTG ACTTGAGCGT 6900 
AGCCTATGGA AAAACGCCAG 6950 
TTGCTGGCCT TTTGCTCACA 7000 
TGGATAACCG TATTACCGCC 7050 
CGAACGACCG AGCGCAGCGA 7100 
AATACGCAAA CCGCCTCTCC 7150 
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CCGCGCGTTG GCCGATTCAT TAATCCAGCT GGCACGACAG GTTTCCCGAC 7200 
TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGAGTT ACCTCACTCA 7250 
TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT ATGTTGTGTG 7300 
GAATTGTGAG CGGATAACAA TTTCACACAG GAAACAGCTA TGACCATGAT 7350 
TACGAATTAA 7360 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6889 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 
ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 
TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 650 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 
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TTGGAACGCG GATTCCCCGT GCCAAGAGTG CTGTAAGTAC CGCCTATAGA 750 
GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT 800 
GCATCGTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGGAGACCTA 850 
CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA GAATGACCAC 900 
AACCTCTTCA GTGGAAGGTA AACAGAATCT GGTGATTATG GGTAGGAAAA 950 
CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 
ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 1050 
TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 1100 
CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 1150 
GAAGCCATGA ATCAACCAGG CCACCTTAGA CTCTTTGTGA CAAGGATCAT 1200 
GCAGGAATTT GAAAGTGACA CG TTTTT CCC AGAAATTGAT TTGGGGAAAT 1250 
ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 1300 
AAAGGCATCA AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 1350 
AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 1400 
GACCATGGGA CTTTTGCTGG CTTTAGACCC CCTTGGCTTC GTTAGAACGC 1450 
GGCTACAATT AATACATAAC CTTATGTATC ATACACATAG ATTTAGGTGA 1500 
CACTATAGAA TAACATCCAC TTTGCCTTTC TCTCCACAGG TGTCACTCCA 1550 
GGTCAACTGC ACCTCGGTTC TATCGATTGA ATTCCCCGGC CATAGCTGTC 1600 
TGGCATGGGC CTCTCCACCG TGCCTGACCT GCTGCTGCCG CTGGTGCTCC 1€50 
TGGAGCTGTT GGTGGGAATA TACCCCTCAG GGGTTATTGG ACTGGTCCCT 1700 
CACCTAGGGG ACAGGGAGAA GAGAGATAGT GTGTGTCCCC AAGGAAAATA 1750 
TATCCACCCT CAAAATAATT CGATTTGCTG TACCAAGTGC CACAAAGGAA 1800 
CCTXCTTGTA CAATGACTGT CCAGGCCCGG GGCAGGATAC GGACTGCAGG 1850 
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GAGTGTGAGA GCGGCTCCTT CACCGCTTCA GAAAACCACC TCAGACACTG 1900 
CCTCAGCTGC TCCAAATGCC GAAAGGAAAT GGGTCAGGTG GAGATCTCTT 1950 
CTTGCACAGT GGACCGGGAC ACCGTGTGTG GCTGCAGGAA GAACCAGTAC 2000 
CGGCATTATT GGAGTGAAAA CCTTTTCCAG TGCTTCAATT GCAGCCTCTG 2050 
CCTCAATGGG ACCGTGCACC TCTCCTGCCA GGAGAAACAG AACACCGTGT 2100 
GCACCTGCCA TGCAGGTTTC TTTCTAAGAG AAAACGAGTG TGTCTCCTGT 2150 
AGTAACTGTA AGAAAAGCCT GGAGTGCACG AAGTTGTGCC TACCCCAGAT 2200 
TGAGAATGTT AAGGGCACTG AGGACTCAGG CACCACAGAC AAGAGAGTTG 2250 
AG CTCAAAAC CCCACTTGGT GACACAACTC ACACATGCCC ACGGTGCCCA 2300 
GAGCCCAAAT CTTGTGACAC ACCTCCCCCG TGCCCACGGT GCCCAGAGCC 2350 
CAAATCTTGT GACACACCTC CCCCATGCCC ACGGTGCCCA GAGCCCAAAT 2400 
CTTGTGACAC ACCTCCCCCA TGCCCACGGT GCCCAGCACC TGAACTCCTG 2450 
GGAGGACCGT CAGTCTTCCT CTTCCCCCCA AAACCCAAGG ATACCCTTAT 2500 
GATTTCCCGG ACCCCTGAGG TCACGTGCGT GGTGGTGGAC GTGAGCCACG 2550 
AAGACCCCGA GGTCCAGTTC AAGTGGTACG TGGACGGCGT GGAGGTGCAT 2600 
AATGCCAAGA CAAAGCCGCG GGAGGAGCAG TTCAACAGCA CGTTCCGTGT 2650 
GGTCAGCGTC CTCACCGTCC TGCACCAGGA CTGGCTGAAC GGCAAGGAGT 2700 
ACAAGTGCAA GGTCTCCAAC AAAGCCCTCC CAGCCCCCAT CGAGAAAACC 2750 
ATCTCCAAAA CCAAAGGACA GCCCCGAGAA CCACAGGTGT ACACCCTGCC 2800 
CCCATCCCGG GAGGAGATGA CCAAGAACCA GGTCAGCCTG ACCTGCCTGG 2850 
TCAAAGGCTT CTACCCCAGC GACATCGCCG TGGAGTGGGA GAGCAGCGGG 2900 
CAGCCGGAGA ACAACTACAA CACCACGCCT CCCATGCTGG ACTCCGACGG 2950 
CTCCTTCTTC CTCTACAGCA AGCTCACCGT GGACAAGAGC AGGTGGCAGC 3000 
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AGGGGAACAT CTTCTCATGC TCCGTGATGC ATGAGGCTCT GCACAACCGC 3050 
TTCACGCAGA AGAGCCTCTC CCTGTCTCCG GGTAAATGAG TGCGACGGCC 3100 

5 

GGGGATCCTC TAGAGTCGAC CTGCAGAAGC TTGGCCGCCA TGGCCCAACT 3150 
10 TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 3200 
TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA 3250 

15 

ACTCATCAAT GTATCTTATC ATGTCTGGAT CGATCGGGAA TTAATTCGGC 3300 
GCAGCACCAT GGCCTGAAAT AACCTCTGAA AGAGGAACTT GGTTAGGTAC 3350 

20 

CTTCTGAGGC GGAAAGAACC AGCTGTGGAA TGTGTGTCAG TTAGGGTGTG 3400 
25 GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3450 
AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG 3500 

30 

AAGTATGCAA AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC 3550 
TAACTCCGCC CATCCCGCCC CTAACTCCGC CCAGTTCCGC CCATTCTCCG 3600 

35 

CCCCATGGCT GACTAATTTT TTTTATTTAT GCAGAGGCCG AGGCCGCCTC 3650 
40 GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGG CTTTTTT GGAGGCCTAG 3700 
GCTTTTGCAA AAAGCTGTTA ACAGCTTGGC ACTGGCCGTC GTTTTACAAC 3750 

45 

GTCGTGACTG GGAAAACCCT GGCGTTACCC AACTTAATCG CCTTGCAGCA 3800 
CATCCCCCCT TCGCCAGCTG GCGTAATAGC GAAGAGGCCC GCACCGATCG 3850 

50 

CCCTTCCCAA CAGTTGCGTA GCCTGAATGG CGAATGGCGC CTGATGCGGT 3900 
55 ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT ACGTCAAAGC 3950 
AACCATAGTA CGCGCCCTGT AGCGGCGCAT TAAGCGCGGC GGGTGTGGTG 4000 

60 

GTTACGCGCA GCGTGACCGC TACACTTGCC AGCGCCCTAG CGCCCGCTCC 4050 
TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC TTTCCCCGTC 4100 

65 

AAGCTCTAAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG TGCTTTACGG 4150 
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CACCTCGACC CCAAAAAACT TGATTTGGGT GATGGTTCAC GTAGTGGGCC 4200 
ATCGCCCTGA TAGACGGTTT TTCGCCCTTT GACGTTGGAG TCCACGTTCT 4250 

5 

TTAATAGTGG ACTCTTGTTC CAAACTGGAA CAACACTCAA CCCTATCTCG 4300 
10 GGCTATTCTT TTGATTTATA AGGGATTTTG CCGATTTCGG CCTATTGGTT 4350 
AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT 4400 

15 

TAACGTTTAC AATTTTATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 4450 
GCATAGTTAA GCCAACTCCG CTATCGCTAC GTGACTGGGT CATGGCTGCG 4500 

20 

CCCCGACACC CGCCAACACC CGCTGACGCG CCCTGACGGG CTTGTCTGCT 4550 
25 CCCGGCATCC GCTTACAGAC AAGCTGTGAC CGTCTCCGGG AGCTGCATGT 4600 
GTCAGAGGTT TTCACCGTCA TCACCGAAAC GCGCGAGGCA GTATTCTTGA 4650 

30 

AGACGAAAGG GCCTCGTGAT ACGCCTATTT TTATAGGTTA ATGTCATGAT 4700 
AATAATGGTT TCTTAGACGT CAGGTGGCAC TTTTCGGGGA AATGTGCGCG 4750 

35 

GAACCCCTAT TTGTTTATTT TTCTAAATAC ATTCAAATAT GTATCCGCTC 4800 
4 0 ATGAGACAAT AACCCTGATA AATGCTTCAA TAATATTGAA AAAGGAAGAG 4850 
TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTGCGGCAT 4900 

45 

TTTGCCTTCC TG TTTTT GCT CACCCAGAAA CGCTGGTGAA AGTAAAAGAT 4950 
GCTGAAGATC AGTTGGGTGC ACGAGTGGGT TACATCGAAC TGGATCTCAA 5000 

50 

CAGCGGTAAG ATCCTTGAGA GTTTTCGCCC CGAAGAACGT TTTCCAATGA 5050 
55 TGAGCACTTT TAAAGTTCTG CTATGTGGCG CGGTATTATC CCGTGATGAC 5100 
GCCGGGCAAG AGCAACTCGG TCGCCGCATA CACTATTCTC AGAATGACTT 5150 

60 

GGTTGAGTAC TCACCAGTCA CAGAAAAGCA TCTTACGGAT GGCATGACAG 5200 
TAAGAGAATT ATGCAGTGCT GCCATAACCA TGAGTGATAA CACTGCGGCC 5250 

65 

AACTTACTTC TGACAACGAT CGGAGGACCX3 AAGGAGCTAA CCG CT T TTT T 5300 
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GCACAACATG GGGGATCATG TAACTCGCCT TGATCGTTGG GAACCGGAGC 5350 
TGAATGAAGC CATACCAAAC GACGAGCGTG ACACCACGAT GCCAGCAGCA 5400 
ATGGCAACAA CGTTGCGCAA ACTATTAACT GGCGAACTAC TTACTCTAGC 5450 
TTCCCGGCAA CAATTAATAG ACTGGATGGA GGCGGATAAA GTTGCAGGAC 5500 
CACTTCTGCG CTCGGCCCTT CCGGCTGGCT GGTTTATTGC TGATAAATCT 5550 
GGAGCCGGTG AGCGTGGGTC TCGCGGTATC ATTGCAGCAC TGGGGCCAGA 5600 
TGGTAAGCCC TCCCGTATCG TAGTTATCTA CACGACGGGG AGTCAGGCAA 5650 
CTATGGATGA ACGAAATAGA CAGATCGCTG AGATAGGTGC CTCACTGATT 5700 
AAGCATTGGT AACTGTCAGA CCAAGTTTAC TCATATATAC TTTAGATTGA 5750 
TTTAAAACTT CATTTTTAAT TTAAAAGGAT CTAGGTGAAG ATCCTTTTTG 5800 
ATAATCTCAT GACCAAAATC CCTTAACGTG AGTTTTCGTT CCACTGAGCG 5850 
TCAGACCCCG TAGAAAAGAT CAAAGGATCT TCTTGAGATC CTTTTTTTCT 5900 
GCGCGTAATC TGCTGCTTGC AAACAAAAAA ACCACCGCTA CCAGCGGTGG 5950 
TTTGTTTGCC GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC 6000 
TTCAGCAGAG CGCAGATACC AAATACTGTC CTTCTAGTGT AGCCGTAGTT 6050 
AGGCCACCAC TTCAAGAACT CTGTAGCACC GCCTACATAC CTCGCTCTGC 6100 
TAATCCTGTT ACCAGTGGCT GCTGCCAGTG GCGATAAGTC GTGTCTTACC 6150 
GGGTTGGACT CAAGACGATA GTTACCGGAT AAGGCGCAGC GGTCGGGCTG 6200 
AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG ACCTACACCG 6250 
AACTGAGATA CCTACAGCGT GAGCATTGAG AAAGCGCCAC GCTTCCCGAA 6300 
GGGAGAAAGG CGGACAGGTA TCCGGTAAGC GGCAGGGTCG GAACAGGAGA 6350 
GCGCACGAGG GAGCTTCCAG GGGGAAACGC CTGGTATCTT TATAGTCCTG 6400 
TCGGGTTTCG CCACCTCTGA CTTGAGCGTC GATTTTTGTG ATGCTCGTCA 6450 
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GGGGGGCGGA GCCTATGGAA AAACGCCAGC AACGCGGCCT TTTTACGGTT 6500 
CCTGGCCTTT TGCTGGCCTT TTGCTCACAT GTTCTTTCCT GCGTTATCCC 6550 
CTGATTCTGT GGATAACCGT ATTACCGCCT TTGAGTGAGC TGATACCGCT 6600 
CGCCGCAGCC GAACGACCGA GCGCAGCGAG TCAGTGAGCG AGGAAGCGGA 6650 
AGAGCGCCCA ATACGCAAAC CGCCTCTCCC CGCGCGTTGG CCGATTCATT 6700 
AATCCAGCTG GCACGACAGG TTTCCCGACT GGAAAGCGGG CAGTGAGCGC 6750 
AACGCAATTA ATGTGAGTTA CCTCACTCAT TAGGCACCCC AGGCTTTACA 6800 
CTTTATGCTT CCGGCTCGTA TGTTGTGTGG AATTGTGAGC GGATAACAAT 6850 
TTCACACAGG AAACAGCTAT GACCATGATT ACGAATTAA 6889 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6557 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGAGTCGATC GACAGCTGTG 50 
GAATGTGTGT CAGTTAGGGT GTGGAAAGTC CCCAGGCTCC CCAGCAGGCA 100 
GAAGTATGCA AAGCATGCAT CTCAATTAGT CAGCAACCAG GTGTGGAAAG 150 
TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA 200 
GTCAGCAACC ATAGTCCCGC CCCTAACTCC GCCCATCCCG CCCCTAACTC 250 
CGCCCAGTTC CGCCCATTCT CCGCCCCATG GCTGACTAAT TTTTTTTATT 300 
TATGCAGAGG CCGAGGCCGC CTCGGCCTCT GAGCTATTCC AGAAGTAGTG 350 
AGGAGGCTTT TTTGGAGGCC TAGGCTTTTG CAAAAAGCTA GCTTATCCGG 400 
CCGGGAACGG TGCATTGGAA CGCGGATTCC CCGTGCCAAG AGTGACGTAA 450 
GTACCGCCTA TAGAGCGATA AGAGGATTTT ATCCCCGCTG CCATCATGGT 500 
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TCGACCATTG AACTGCATCG TCGCCGTGTC CCAAAATATG GGQATTGGCA 550 



AGAACGGAGA CCTACCCTGG CCTCCGCTCA GGAACGAGTT CAAGTACTTC 600 

5 

CAAAGAATGA CCACAACCTC TTCAGTGGAA GGTAAACAGA ATCTGGTGAT 650 
10 TATGGGTAGG AAAACCTGGT TCTCCATTCC TGAGAAGAAT CGACCTTTAA 700 
AGGACAGAAT TAATATAGTT CTCAGTAGAG AACTCAAAGA ACCACCACGA 750 

15 

GGAGCTCATT TTCTTGCCAA AAGTTTGGAT GATGCCTTAA GACTTATTGA BOO 
ACAACCGGAA TTGGCAAGTA AAGTAGACAT GGTTTGGATA GTCGGAGGCA 850 

20 

GTTCTGTTTA CCAGGAAGCC ATGAATCAAC CAGGCCACCT TAGACTCTTT 900 
25 GTGACAAGGA TCATGCAGGA ATTTGAAAGT GACACGTTTT TCCCAGAAAT 950 
TGATTTGGGG AAATATAAAC CTCTCCCAGA ATACCCAGGC GTCCTCTCTG 1000 

30 

AGGTCCAGGA GGAAAAAGGC ATCAAGTATA AGTTTGAAGT CTACGAGAAG 1050 
AAAGACTAAC AGGAAGATGC TTTCAAGTTC TCTGCTCCCC TCCTAAAGCT 1100 

35 

ATGCATTTTT ATAAGACCAT GGGACTTTTG CTGGCTTTAG ATCCCCTTGG 1150 
40 CTTCGTTAGA ACGCAGCTAC AATTAATACA TAACCTTATG TATCATACAC 1200 
ATACGATTTA GGTGACACTA TAGATAACAT CCACTTTGCC TTTCTCTCCA 1250 

45 

CAGGTGTCCA CTCCCAGGTC CAACTGCACC TCGGTTCTAT CGATTGAATT 1300 
CCACCATGGG ATGGTCATGT ATCATCCTTT TTCTAGTAGC AACTGCAACT 1350 

50 

GGAGTACATT CAGAAGTTCA GCTGGTGGAG TCTGGCGGTG GCCTGGTGCA 1400 
55 GCCAGGGGGC TCACTCCGTT TGTCCTGTGC AGTTTCTGGC TACTCCATCA 1450 
CCTCCGGATA TAGCTGGAAC TGGATCCGTC AGGCCCCGGG TAAGGGCCTG 1500 

60 

GAATGGGTTG CATCGATTAC GTATGCCGGA TCGACTAACT ATAACCCTAG 1550 
CGTCAAGGGC CGTATCACTA TAAGTCGCGA CGATTCCAAA AACACATTCT 1600 

65 

ACCTGCAGAT GAACAGCCTG CGTGCTGAGG ACACTGCCGT CTATTATTGT 1650 
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GCTCGAGGCA GCCACTATTT CGGCGCCTGG CACTTCGCCG TGTGGGGTCA 1700 
AGGAACCCTG GTCACCGTCT CCTCGGCCTC CACCAAGGGC CCATCGGTCT 1750 
TCCCCCTGGC ACCCTCCTCC AAGAGCACCT CTGGGGGCAC AGCGGCCCTG 1800 
GGCTGCCTGG TCAAGGACTA CTTCCCCGAA CCGGTGACGG TGTCGTGGAA 1850 
CTCAGGCGCC CTGACCAGCG GCGTGCACAC CTTCCCGGCT GTCCTACAGT 1900 
CCTCAGGACT CTACTCCCTC AGCAGCGTGG TGACTGTGCC CTCTAGCAGC 1950 
TTGGGCACCC AGACCTACAT CTGCAACGTG AATCACAAGC CCAGCAACAC 2000 
CAAGGTGGAC AAGAAAGTTG AGCCCAAATC TTGTGACAAA ACTCACACAT 2050 
GCCCACCGTG CCCAGCACCT GAACTCCTGG GGGGACCGTC AGTCTTCCTC 2100 
TTCCCCCCAA AACCCAAGGA CACCCTCATG ATCTCCCGGA CCCCTGAGGT 2150 
CACATGCGTG GTGGTGGACG TGAGCCACGA AGACCCTGAG GTCAAGTTCA 2200 
ACTGGTACGT GGACGGCGTG GAGGTGCATA ATGCCAAGAC AAAGCCGCGG 2250 
GAGGAGCAGT ACAACAGCAC GTACCGTGTG GTCAGCGTCC TCACCGTCCT 2300 
GCACCAGGAC TGGCTGAATG GCAAGGAGTA CAAGTGCAAG GTCTCCAACA 2350 
AAGCCCTCCC AGCCCCCATC GAGAAAACCA TCTCCAAAGC CAAAGGGCAG 2400 
CCCCGAGAAC CACAGGTGTA CACCCTGCCC CCATCCCGGG AAGAGATGAC 2450 
CAAGAACCAG GTCAGCCTGA CCTGCCTGGT CAAAGGCTTC TATCCCAGCG 2500 
ACATCGCCGT GGAGTGGGAG AGCAATGGGC AGCCGGAGAA CAACTACAAG 2550 
ACCACGCCTC CCGTGCTGGA CTCCGACGGC TCCTTCTTCC TCTACAGCAA 2600 
GCTCACCGTG GACAAGAGCA GGTGGCAGCA GGGGAACGTC TTCTCATGCT 2650 
CCGTGATGCA TGAGGCTCTG CACAACCACT ACACGCAGAA GAGCCTCTCC 2700 
CTGTCTCCGG GTAAATGAGT GCGACGGCCC TAGAGTCGAC CTGCAGAAGC 2750 
TTGGCCGCCA TGGCCCAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT 2800 
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AAAGCAATAG CATCACAAAT TTCACAAATA AAGCATTTTT TTCACTGCAT 2850 
TCTAGTTGTG GTTTGTCCAA ACTCATCAAT GTATCTTATC ATGTCTGGAT 2900 

5 

CGATCGGGAA TTAATTCGGC GCAGCACCAT GGCCTGAAAT AACCTCTGAA 2950 
10 AGAGGAACTT GGTTAGGTAC CTTCTGAGGC GGAAAGAACC AGCTGTGGAA 3000 
TGTGTGTCAG TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA 3050 
GTATGCAAAG CATGCATCTC AATTAGTCAG CAACCAGGTG TGGAAAGTCC 3100 
CCAGGCTCCC CAGCAGGCAG AAGTATGCAA AGCATGCATC TCAATTAGTC 3150 
AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC CTAACTCCGC 3200 
25 CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3250 
GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG 3300 
AGGCTTTTTT GGAGGCCTAG GCTTTTGCAA AAAGCTGTTA CCTCGAGCGG 3350 
CCGCTTAATT AAGGCGCGCC ATTTAAATCC TGCAGGTAAC AG CTTGGCAC 3400 
TGGCCGTCGT TTTACAACGT CGTGACTGGG AAAACCCTGG CGTTACCCAA 3450 
40 CTTAATCGCC TTGCAGCACA TCCCCCCTTC GCCAGCTGGC GTAATAGCGA 3S0O 
AGAGGCCCGC ACCGATCGCC CTTCCCAACA GTTGCGTAGC CTGAATGGCG 3550 
AATGGCGCCT GATGCGGTAT TTTCTCCTTA CGCATCTGTG CGGTATTTCA 3600 
CACCGCATAC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCG CATTA 3650 
AGCGCGGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG 3700 
55 CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT 3750 
TCGCCGGCTT TCCCCGTCAA GCTCTAAATC GGGGGCTCCC TTTAGGGTTC 3800 
CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG ATTTGGGTGA 3850 

TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 3900 

» 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA 3950 



30 



35 



45 



50 



60 



65 
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ACACTCAACC CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC 4000 
GATTTCGGCC TATTGGTTAA AAAATGAGCT GATTTAACAA AAATTTAACG 4050 
CGAATTTTAA CAAAATATTA ACGTTTACAA TTTTATGGTG CACTCTCAGT 4100 
ACAATCTGCT CTGATGCCGC ATAGTTAAGC CAACTCCGCT ATCGCTACGT 4150 
GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG CTGACGCGCC 4200 
CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 4250 
TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC 4300 
GCGAGGCAGT ATTCTTGAAG ACGAAAGGGC CTCGTGATAC GCCTATTTTT 4350 
ATAGGTTAAT GTCATGATAA TAATGGTTTC TTAGACGTCA GGTGGCACTT 4400 
TTCGGGGAAA TGTGCGCGGA ACCCCTATTT GTTTATTTTT CTAAATACAT 4450 
TCAAATATGT ATCCGCTCAT GAGACAATAA CCCTGATAAA TGCTTCAATA 4500 
ATATTGAAAA AGGAAGAGTA TGAGTATTCA ACATTTCCGT GTCGCCCTTA 4550 
TTCCCTTTTT TGCGGCATTT TGCCTTCCTG TTTTTGCTCA CCCAGAAACG 4600 
CTGGTGAAAG TAAAAGATGC TGAAGATCAG TTGGGTGCAC GAGTGGGTTA 4650 
CATCGAACTG GATCTCAACA GCGGTAAGAT CCTTGAGAGT TTTCGCCCCG 4700 
AAGAACGTTT TCCAATGATG AGCACTTTTA AAGTTCTGCT ATGTGGCGCG 4750 
GTATTATCCC GTGATGACGC CGGGCAAGAG CAACTCGGTC GCCGCATACA 4800 
CTATTCTCAG AATGACTTGG TTGAGTACTC ACCAGTCACA GAAAAGCATC 4850 
TTACGGATGG CATGACAGTA AGAGAATTAT GCAGTGCTGC CATAACCATG 4900 
AGTGATAACA CTGCGGCCAA CTTACTTCTG ACAACGATCG GAGGACCGAA 4950 
GGAGCTAACC GCTTTTTTGC ACAACATGGG GGATCATGTA ACTCGCCTTG 5000 
ATCGTTGGGA ACCGGAGCTG AATGAAGCCA TACCAAACGA CGAGCGTGAC 5050 
ACCACGATGC CAGCAGCAAT GGCAACAACG TTGCGCAAAC TATTAACTGG 5100 
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CGAACTACTT ACTCTAGCTT CCCGGCAACA ATTAATAGAC TGGATGGAGG 5150 



CGGATAAAGT TGCAGGACCA CTTCTGCGCT CGGCCCTTCC GGCTGGCTGG 5200 

5 

TTTATTGCTG ATAAATCTGG AGCCGGTGAG CGTGGGTCTC GCGGTATCAT 5250 
10 TGCAGCACTG GGGCCAGATG GTAAGCCCTC CCGTATCGTA GTTATCTACA 5300 
CGACGGGGAG TCAGGCAACT ATGGATGAAC GAAATAGACA GATCGCTGAG 5350 

15 

ATAGGTGCCT CACTGATTAA GCATTGGTAA CTGTCAGACC AAGTTTACTC 54 00 
ATATATACTT TAGATTGATT TAAAACTTCA TTTTTAATTT AAAAGGATCT 5450 

20 

AGGTGAAGAT CCTTTTTGAT AATCTCATGA CCAAAATCCC TTAACGTGAG 5500 
25 TTTTCGTTCC ACTGAGCGTC AGACCCCGTA GAAAAGATCA AAGGATCTTC 5550 
TTGAGATCCT TTTTTTCTGC GCGTAATCTG CTGCTTGCAA ACAAAAAAAC 5600 

30 

CACCGCTACC AGCGGTGGTT TGTTTGCCGG ATCAAGAGCT ACCAACTCTT 5650 
TTTCCGAAGG TAACTGGCTT CAGCAGAGCG CAGATACCAA ATACTGTCCT 5700 

35 

TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT GTAGCACCGC 5750 
40 CTACATACCT CGCTCTGCTA ATCCTGTTAC CAGTGGCTGC TGCCAGTGGC 5800 
GATAAGTCGT GTCTTACCGG GTTGGACTCA AGACGATAGT TACCGGATAA 5850 

45 

GGCGCAGCGG TCGGGCTGAA CGGGGGGTTC GTGCACACAG CCCAGCTTGG 5900 
AG CGAACGAC CTACACCGAA CTGAGATACC TACAGCGTGA GCATTGAGAA 5950 

50 

AGCGCCACGC TTCCCGAAGG GAGAAAGGCG GACAGGTATC CGGTAAGCGG 6000 
55 CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG GGAAACGCCT 6050 
GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA 6100 

60 

TTTTTGTGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA 6150 
CGCGGCCTTT TTACGGTTCC TGGCCTTTTG CTGGCCTTTT GCTCACATGT 6200 

65 

TCTTTCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAT TACCGCCTTT 6250 
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GAGTGAGCTG ATACCGCTCG CCGCAGCCGA ACGACCGAGC GCAGCGAGTC 6300 
AGTGAGCGAG GAAGCGGAAG AGCGCCCAAT ACGCAAACCG CCTCTCCCCG 6350 
CGCGTTGGCC GATTCATTAA TCCAGCTGGC ACGACAGGTT TCCCGACTGG 6400 
AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTACC TCACTCATTA 6450 
GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6500 
TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC 6550 
GAATTAA 6557 

(2) INFORMATION FOR SEQ ID NO: 4: 

{ i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7305 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 
ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 
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TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 650 



CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 

5 

TTGGAACGCG GATTCCCCGT GCCAAGAGTG ACGTAAGTAC CGCCTATAGA 750 
10 GTCTATAGGC CCACCCCCTT GGCTTCGTTA GAACGCGGCT ACAATTAATA 800 
CATAACCTTA TGTATCATAC ACATACGATT TAGGTGACAC TATAGAATAA 850 

15 

CATCCACTTT GCCTTTCTCT CCACAGGTGT CCACTCCCAG GTCCAACTGC 900 
ACCTCGGTTC TAAGCTTATC GATATGAAAA AGCCTGAACT CACCGCGACG 950 

20 

TCTGTCGAGA AGTTTCTGAT CGAAAAGTTC GACAGCGTCT CCGACCTGAT 1000 
25 GCAGCTCTCG GAGGGCGAAG AATCTCGTGC TTTCAGCTTC GATGTAGGAG 1050 
GGCGTGGATA TGTCCTGCGG GTAAATAGCT GCGCCGATGG TTTCTACAAA 1100 

30 

GATCGTTATG TTTATCGGCA CTTTGCATCG GCCGCGCTCC CGATTCCGGA 1150 
AGTGCTTGAC ATTGGGGAAT TCAGCGAGAG CCTGACCTAT TGCATCTCCC 1200 

35 

GCCGTGCACA GGGTGTCACG TTGCAACACC TGCCTGAAAC CGAACTGCCC 1250 
40 GCTGTTCTGC AGCCGGTCGC GGAGGCCATG GATGCGATCG CTGCGGCCGA 1300 
TCTTAGCCAG ACGAGCGGGT TCGGCCCATT CGGACCGCAA GGAATCGGTC 1350 

45 

AATACACTAC ATGGCGTGAT TTCATATGCG CGATTGCTGA TCCCCATGTG 14 00 
TATCACTGGC AAACTGTGAT GGACGACACC GTCAGTGCGT CCGTCGCGCA 1450 

50 

GGCTCTCGAT GAGCTGATGC TTTGGGCCGA GGACTGCCCC GAAGTCCGGC 1500 
55 ACCTCGTGCA CGCGGATTTC GGCTCCAACA ATGTCCTGAC GGACAATGGC 1550 
CGCATAACAG CGGTCATTGA CTGGAGCGAG GCGATGTTCG GGGATTCCCA 1600 

60 

ATACGAGGTC GCCAACATCT TCTTCTGGAG GCCGTGGTTG GCTTGTATGG 1650 
AGCAGCAGAC GTACTTCGAG CGGAGGCATC CGGAGCTTGC AGGATCGCCG 1700 

65 

CGGCTCCGGG CGTATATGCT CCGCATTGGT CTTGACCAAC TCTATCAGAG 1750 
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CTTGGTTGAC GGCAATTTCG ATGATGCAGC TTGGGCGCAG GGTCGATGCG 1800 
ACGCAATCGT CCGATCCGGA GCCGGGACTG TCGGGCGTAC ACAAATCGCC 1850 
CGCAGAAGCG CGGCCGTCTG GACCGATGGC TGTGTAGAAG TACTCGCCGA 1900 
TAGTGGAAAC CGACGCCCCA GCACTCGTCC GAGGGCAAAG GAATAGAGTA 1950 
GATGCCGACC GAAGGATCCC CGGGGAATTC AATCGATGGC CGCCATGGCC 2000 
CAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC AATAGCATCA 2050 
CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 2100 
TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCGATC GGGAATTAAT 2150 
TCGGCGCAGC ACCATGGCCT GAAATAACCT CTGAAAGAGG AACTTGGTTA 2200 
GGTACCTTCT GAGGCGGAAA GAACCAGCTG TGGAATGTGT GTCAGTTAGG 2250 
GTGTGGAAAG TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC 2300 
ATCTCAATTA GTCAGCAACC AGGTGTGGAA AGTCCCCAGG CTCCCCAGCA 2350 
GGCAGAAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA CCATAGTCCC 2400 
GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2450 
CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC 2500 
GCCTCGGCCT CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG 2550 
CCTAGGCTTT TGCAAAAAGC TAGCTTATCC GGCCGGGAAC GGTGCATTGG 2600 
AACGCGGATT CCCCGTGCCA AGAGTCAGGT AAGTACCGCC TATAGAGTCT 2650 
ATAGGCCCAC CCCCTTGGCT TCGTTAGAAC GCGGCTACAA TTAATACATA 2700 
ACCTTTTGGA TCGATCCTAC TGACACTGAC ATCCACTTTT TCTTTTTCTC 2750 
CACAGGTGTC CACTCCCAGG TCCAACTGCA CCTCGGTTCG CGAAGCTAGC 2800 
TTGGGCTGCA TCGATTGAAT TCCACCATGG GATGGTCATG TATCATCCTT 2850 
TTTCTAGTAG CAACTGCAAC TGGAGTACAT TCAGATATCC AGCTGACCCA 2900 
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GTCCCCGAGC TCCCTGTCCG CCTCTGTGGG CGATAGGGTC ACCATCACCT 2950 
GCCGTGCCAG TCAGAGCGTC GATTACGATG GTGATAGCTA CATGAACTGG 3000 

5 

TATCAACAGA AACCAGGAAA AGCTCCGAAA CTACTGATTT ACGCGGCCTC 3050 
10 GTACCTGGAG TCTGGAGTCC CTTCTCGCTT CTCTGGATCC GGTTCTGGGA 3100 
CGGATTTCAC TCTGACCATC AGCAGTCTGC AGCCGGAAGA CTTCGCAACT 3150 

15 

TATTACTGTC AGCAAAGTCA CGAGGATCCG TACACATTTG GACAGGGTAC 3200 
CAAGGTGGAG ATCAAACGAA CTGTGGCTGC ACCATCTGTC TTCATCTTCC 3250 

20 

CGCCATCTGA TGAGCAGTTG AAATCTGGAA CTGCCTCTGT TGTGTGCCTG 3300 
2 5 CTGAATAACT TCTATCCCAG AGAGGCCAAA GTACAGTGGA AGGTGGATAA 3350 
CGCCCTCCAA TCGGGTAACT CCCAGGAGAG TGTCACAGAG CAGGACAGCA 3400 

30 

AGGACAGCAC CTACAGCCTC AGCAGCACCC TGACGCTGAG CAAAGCAGAC 3450 
TACGAGAAAC ACAAAGTCTA CGCCTGCGAA GTCACCCATC AGGGCCTGAG 3500 

35 

CTCGCCCGTC ACAAAGAGCT TCAACAGGGG AGAGTGTTAA GCTTCGATGG 3550 
4 0 CCGCCATGGC CCAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG 3600 
CAATAGCATC ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA 3650 

45 

GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT CTGGATCGAT 3700 
CGGGAATTAA TTCGGCGCAG CACCATGGCC TGAAATAACC TCTGAAAGAG 3750 

50 

GAACTTGGTT AGGTACCTTC TGAGGCGGAA AGAACCAGCT GTGGAATGTG 3800 
55 TGTCAGTTAG GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT 3850 
GCAAAGCATG CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG 3900 

60 

GCTCCCCAGC AGGCAGAAGT ATGCAAAGCA TGCATCTCAA TTAGTCAGCA 3950 
ACCATAGTCC CGCCCCTAAC TCCGCCCATC CCGCCCCTAA CTCCGCCCAG 4000 

65 

TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT ATTTATGCAG 4050 
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AGGCCGAGGC CGCCTCGGCC TCTGAGCTAT TCCAGAAGTA GTGAGGAGGC 4100 
TTTTTTGGAG GCCTAGGCTT TTGCAAAAAG CTGTTAACAG CTTGGCACTG 4150 
GCCGTCGTTT TACAACGTCG TGACTGGGAA AACCCTGGCG TTACCCAACT 4200 
TAATCGCCTT GCAGCACATC CCCCCTTCGC CAGCTGGCGT AATAGCGAAG 4250 
AGGCCCGCAC CGATCGCCCT TCCCAACAGT TGCGTAGCCT GAATGGCGAA 4300 
TGGCGCCTGA TGCGGTATTT TCTCCTTACG CATCTGTGCG GTATTTCACA 4350 
CCGCATACGT CAAAGCAACC ATAGTACGCG CCCTGTAGCG GCGCATTAAG 4400 
CGCGGCGGGT GTGGTGGTTA CGCGCAGCGT GACCGCTACA CTTGCCAGCG 4450 
CCCTAGCGCC CGCTCCTTTC GCTTTCTTCC CTTCCTTTCT CGCCACGTTC 4500 
GCCGGCTTTC CCCGTCAAGC TCTAAATCGG GGGCTCCCTT TAGGGTTCCG 4550 
ATTTAGTGCT TTACGGCACC TCGACCCCAA AAAACTTGAT TTGGGTGATG 4600 
GTTCACGTAG TGGGCCATCG CCCTGATAGA CGGTTTTTCG CCCTTTGACG 4650 
TTGGAGTCCA CGTTCTTTAA TAGTGGACTC TTGTTCCAAA CTGGAACAAC 4700 
ACTCAACCCT ATCTCGGGCT ATTCTTTTGA TTTATAAGGG ATTTTGCCGA 4750 
TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG 4800 
AATTTTAACA AAATATTAAC GTTTACAATT TTATGGTGCA CTCTCAGTAC 4850 
AATCTGCTCT GATGCCGCAT AGTTAAGCCA ACTCCGCTAT CGCTACGTGA 4900 
CTGGGTCATG GCTGCGCCCC GACACCCGCC AACACCCGCT GACGCGCCCT 4950 
GACGGGCTTG TCTGCTCCCG GCATCCGCTT ACAGACAAGC TGTGACCGTC 5000 
TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC CGAAACGCGC 5050 
GAGGCAGTAT TCTTGAAGAC GAAAGGGCCT CGTGATACGC CTATTTTTAT 5100 
AGGTTAATGT CATGATAATA ATGGTTTCTT AGACGTCAGG TGGCACTTTT 5150 
CGGGGAAATG TGCGCGGAAC CCCTATTTGT TTATTTTTCT AAATACATTC 5200 
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AAATATGTAT CCGCTCATGA GACAATAACC CTGATAAATG CTTCAATAAT 5250 
ATTGAAAAAG GAAGAGTATG AGTATTCAAC ATTTCCGTGT CGCCCTTATT 5300 
CC C T TTTT TG CGGCATTTTG CCTTCCTGTT TTTGCTCACC CAGAAACGCT 5350 
GGTGAAAGTA AAAGATGCTG AAGATCAGTT GGGTGCACGA GTGGGTTACA 5400 
TCGAACTGGA TCTCAACAGC GGTAAGATCC TTGAGAGTTT TCGCCCCGAA 5450 
GAACGTTTTC CAATGATGAG CACTTTTAAA GTTCTGCTAT GTGGCGCGGT 5500 
ATTATCCCGT GATGACGCCG GGCAAGAGCA ACTCGGTCGC CGCATACACT 5550 
ATTCTCAGAA TGACTTGGTT GAGTACTCAC CAGTCACAGA AAAGCATCTT 5600 
ACGGATGGCA TGACAGTAAG AGAATTATGC AGTGCTGCCA TAACCATGAG 5650 
TGATAACACT GCGGCCAACT TACTTCTGAC AACGATCGGA GGACCGAAGG 5700 
AGCTAACCGC TTTTTTGCAC AACATGGGGG ATCATGTAAC TCGCCTTGAT 5750 
CGTTGGGAAC CGGAGCTGAA TGAAGCCATA CCAAACGACG AGCGTGACAC 5800 
CACGATGCCA GCAGCAATGG CAACAACGTT GCGCAAACTA TTAACTGGCG 5850 
AACTACTTAC TCTAGCTTCC CGGCAACAAT TAATAGACTG GATGGAGGCG 5900 
GATAAAGTTG CAGGACCACT TCTGCGCTCG GCCCTTCCGG CTGGCTGGTT 5950 
TATTGCTGAT AAATCTGGAG CCGGTGAGCG TGGGTCTCGC GGTATCATTG 6000 
CAGCACTGGG GCCAGATGGT AAGCCCTCCC GTATCGTAGT TATCTACACG 6050 
ACGGGGAGTC AGGCAACTAT GGATGAACGA AATAGACAGA TCGCTGAGAT 6100 
AGGTGCCTCA CTGATTAAGC ATTGGTAACT GTCAGACCAA GTTTACTCAT 6150 
ATATACTTTA GATTGATTTA AAACTTCATT TTTAATTTAA AAGGATCTAG 6200 
GTGAAGATCC TTTTTGATAA TCTCATGACC AAAATCCCTT AACGTGAGTT 6250 
TTCGTTCCAC TGAGCGTCAG ACCCCGTAGA AAAGATCAAA GGATCTTCTT 6300 
GAGATCCTTT TTTTCTGCGC GTAATCTGCT GCTTGCAAAC AAAAAAACCA 6350 
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CCGCTACCAG CGGTGGTTTG TTTGCCGGAT CAAGAGCTAC CAACTCTTTT 6400 



TCCGAAGGTA ACTGGCTTCA GCAGAGCGCA GATACCAAAT ACTGTCCTTC 6450 

5 

TAGTGTAGCC GTAGTTAGGC CACCACTTCA AGAACTCTGT AGCACCGCCT 6500 



10 ACATACCTCG CTCTG CTAAT CCTGTTACCA GTGGCTGCTG CCAGTGGCGA 6550 
TAAGTCGTGT CTTACCGGGT TGGACTCAAG ACGATAGTTA CCGGATAAGG 6600 

15 

CGCAGCGGTC GGGCTGAACG GGGGGTTCGT GCACACAGCC CAGCTTGGAG 6650 
CGAACGACCT ACACCGAACT GAGATACCTA CAGCGTGAGC ATTGAGAAAG 6700 

20 

CGCCACGCTT CCCGAAGGGA GAAAGGCGGA CAGGTATCCG GTAAGCGGCA 6750 



25 GGGTCGGAAC AGGAGAGCGC ACGAGGGAGC TTCCAGGGGG AAACGCCTGG 6800 
TATCTTTATA GTCCTGTCGG GTTTCGCCAC CTCTGACTTG AGCGTCGATT 6850 

30 

TTTGTGATGC TCGTCAGGGG GGCGGAGCCT ATGGAAAAAC GCCAGCAACG 6900 
CGGCCTTTTT ACGGTTCCTG GCCTTTTGCT GGCCTTTTGC TCACATGTTC 6950 

35 

TTTCCTGCGT TATCCCCTGA TTCTGTGGAT AACCGTATTA CCGCCTTTGA 7000 



4 0 GTGAGCTGAT ACCGCTCGCC GCAGCCGAAC GACCGAGCGC AGCGAGTCAG 7050 
TGAGCGAGGA AGCGGAAGAG CGCCCAATAC GCAAACCGCC TCTCCCCGCG 7100 

45 

CGTTGGCCGA TTCATTAATC CAGCTGGCAC GACAGGTTTC CCGACTGGAA 7150 
AGCGGGCAGT GAGCGCAACG CAATTAATGT GAGTTACCTC ACTCATTAGG 7200 

50 

CACCCCAGGC TTTACACTTT ATGCTTCCGG CTCGTATGTT GTGTGGAATT 7250 



55 GTGAGCGGAT AACAATTTCA CACAGGAAAC AGCTATGACC ATGATTACGA 7300 



ATTAA 7305 

60 
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CLAIMS 

1. A DNA construct comprising a transcriptional initiation site, a 
transcriptional termination site, a selectable gene, a product gene 

5 provided 3' to the selectable gene, a transcriptional regulatory 

region regulating transcription of both the selectable gene and the 
product gene, the selectable gene being positioned within an intron 
having a splice donor site 5* of the intron, which splice donor site 
regulates expression of the product gene using the transcriptional 
10 regulatory region. 

2. The DNA construct of claim 1 wherein the splice donor site comprises 
an efficient splice donor sequence. 

15 3. The* DNA construct of claim 2 wherein the splice donor site comprises 
a consensus splice donor sequence. 

4. The DNA construct of claim 2 wherein the splice donor site comprises 
the sequence GACGTAAGT . 

20 

5. The DNA construct of claim 1 wherein the selectable gene is an 
amplifiable gene. 

6. The DNA construct of claim 5 wherein the amplifiable gene is DHFR. 

25 

7. The DNA construct of claim 1 wherein the transcriptional regulatory 
region comprises a promoter and an enhancer. 

8. A vector comprising the DNA construct of claim 1. 

30 

9. The vector of claim 8 wherein the selectable gene of the DNA 
construct is an amplifiable gene. 

10. The vector of claim 8 that is capable of replication in a eukaryotic 
35 host. 

11. A eukaryotic host cell comprising the vector of claim 10. 

12. A eukaryotic host cell comprising the DNA construct of claim 5. 

40 

13. The host cell of claim 11 wherein the vector is introduced into the 
host cell by electroporation. 

14. A eukaryotic host cell comprising the DNA construct of claim 1 
45 integrated into a chromosome of the host cell. 
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15. The host cell of claim 14 that is a mammalian cell. 

16. A method for producing a product of interest comprising culturing the 
host cell of claim 11 so as to express the product gene and 
recovering the product from the host cell culture. 

17 . The method of claim 16 further comprising recovering the product from 
the culture medium. 

18. The method of claim 16 wherein the selectable gene is an amplifiable 
gene and the splice donor site comprises an efficient splice donor 
sequence . 

19. A method for producing a product of interest comprising culturing the 
host cell of claim 12 so as to express the product gene in a 
selective medium comprising an amplifying agent for sufficient time 
to allow amplification to occur, and recovering the product. 

20. A method for producing eukaryotic cells having multiple copies of a 
product gene comprising transforming eukaryotic cells with the DNA 
construct of claim 5, growing the cells in a selective medium 
comprising an amplifying agent for a sufficient time for 
amplification to occur, and selecting cells having multiple copies 
of the product gene. 

21. The method of claim 20 further comprising recovering from the 
selected cells the product of interest. 
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